Thursday 28 November 2013

Debugging Stop 0x101 [Updated Version]

Okay, I know I've previously explained a Stop 0x101 beforehand, before like a Stop 0x124, I'm going to provide a updated version. Remember you'll need a Kernel Memory dump in order to carry this out.

The third parameter contains the PRCB address of the hung processor, the fourth parameter is in fact Reserved, but does contain the processor number of our hung processor. Microsoft like to make us work a little harder.

Using the !prcb extension with the processor number, you'll notice that the PRCB address matches the address of the hung processor. 

We can also tell that Processor 0 sent out the Clock Interrupt, since the IRQL Level is set to 13. This applies to x64 systems. On a x86 system, the IRQL Level is 28. Clock Interrupts are also hardware interrupts.


We can dump the IDT (!idt), and then see which interrupt vector clock interrupts and IPIs are assigned to.


Okay, we know that Processor 0 sent out the Clock Interrupt, so let's dump the call stack with the knL command, which does the same job as the k command but adds frame numbers.


The Clock Interrupt was sent to update the system time, and keep synchronization. Well, that would have been the purpose if Processor 2 responded in time.

You may also be wondering why I highlighted the IpiInterrupt, well if you look at the nt!KxFlushEntireTb and nt!keFlushMultipleRangeTb routines, these require the use of a IPI (Inter Processor Interrupt) to clear the TLB cache. The the main purpose is again, synchronization. We can view the IPI, and gather further evidence that the Processor 2 is hung with the !ipi extension.


Now, let's finally examine the Processor 2 stack, and it's activity.


There doesn't seem to much activity going on here. This is usually because the processor, couldn't be interrupted and a context gathered, thus the reason for the registers also being completely empty.



Instead, we need to use the !thread extension and then gather a raw stack with the dps command. The !dpx doesn't seem to work well here, since it can't gather a stack pointer.


dps fffff88007fb1000 fffff88007fbc000

 Any a hardware interrupt, any previous activity will be deferred and linked with a DPC. We can examine the DPC Queue with !dpcs extension. This enables the CPU to complete any high level processing to be completed quickly.




I also noticed a interesting routine in the raw stack:


The nt!KiDpcInterruptBypass is used to ignore any DPCs, and continue processing.

Looking through the raw stack, I noticed two third-party drivers which may have lead to a stack overflow, thus the reason for nt!MiCheckForUserStackOverflow. I also couldn't see any evidence of the IPI being processed at all on Processor 3, such as
nt!KiIpiProcessRequests and nt!KiIpiInterrupt. 

Just to quickly add, the Interrupt Flag was set to true (1), and therefore the processor was able to handle hardware interrupts, and the Clock Interrupt was received on the processor, since the system time was updated. IPI's have a higher IRQL Level than Clock Interrupts, and therefore will have higher priority, I believe the processor was waiting upon the IPI completion, which seemed to have never completed, and then a Clock Interrupt was issued within the wrong interval.



References:

Debugging a CLOCK_WATCHDOG_TIMEOUT Bugcheck

Class 101 for 0x101 Bugchecks



3 comments: