Sunday 3 November 2013

Debugging Stop 0x1 - APC's and Guarded Regions

This is the first Stop 0x1, which I actually came across on my own, and to be honest is one of those bugchecks which doesn't contain any information at all really, as a result of when the bugcheck is produced. To really understand, how this bugcheck works and why it happens, you need to understand the concept of APCs and their types, and also how they are used with Critical Regions and Guarded Regions.

APCs are quite a lengthy subject, and therefore I will not explain completely how they work and their internals, but will provide some useful references for you to read or take note of. They are wonderfully explained in the Windows Internals book, which I absolutely recommend that you purchase.

Brief Explanation of APCs  

APC's are a form of asynchronous interrupt, and run within the context of a particular thread or process address space. They can allow page faults, call system services, wait for objects and their handles and acquire objects. 

APCs are called APC objects, and when a thread wishes to use a APC, a APC object is inserted into the queue of that thread, called a APC Queue. The APCs are then executed when the IRQL Level is 1.

There are two main types of APCs: Kernel-Mode and User-Mode. These two types are then divided into Normal and Special. Kernel-Mode APCs can simply run in the context of the target thread without having to wait to gain permission from that thread, whereas, User-Mode APCs have to wait for permission.

APCs and Stop 0x1

Getting back to the subject, of the relationship between APCs and Stop 0x1, let's start examining the important points within the dump file.

This bugcheck always occurring exiting a Service Call, which in this case is usually from calling KeExitGuardedRegion.

 As already pointed out, in the description of the bugcheck, the most significant parameter is the 3rd parameter, we indicates the current value of the thread's CombinedAPCDisable field. The parameter is split into two 16-bit values, a SpecialAPCDisable value and a KernelAPCDisable value.

We can clearly see that both values are negative, which therefore shows that both Special APCs and Kernel APCs were disabled but never re-enabled again. Since both APC types have been disabled, the thread would have entered a Guarded Region rather than a Critical Region, since no APCs are executed within that thread's context upon entering a Guarded Region.

Device drivers will enter Guarded Regions and Critical Regions (disabling APCs), usually when holding a lock (there are many different types of locks), to prevent any Kernel-Mode APCs being used to suspend or terminate the thread, if the thread was terminated or placed into a wait state then the system could potentially deadlock and hang.

When APCs are disabled, two/three fields are set in the _KTHREAD data structure as shown below:

When exiting a Guarded Region or a Critical Region, APCs must be re-enabled again, since they are largely used I/O Manager and in I/O operations. 

Another few points, the second parameter contains the value of the thread's APCStateIndex field, which is stored in the _KAPC data structure:

The APCStateIndex field is a pointer to the APCState field found in the _KTHREAD structure:

We can clearly see that the APCState field contains another data structure called _KAPC_STATE. SavedAPCState which is part the _KTHREAD has the same data structure stored.

The APCState field is called a APC Environment, and this field is used for APCs targeted at the current thread's context, and does not take in regard, if the thread is running within it's own process or attached to another process. SavedAPCState field is also a APC Environment, and contains APCs at threads which are not running within the context of a current process, and therefore these APCs must wait to be delivered.

Driver Verifier and Critical Region Counts

The best option you have, is running Driver Verifier and then checking the Critical Region log file. By using the !verifier 0x200 flag, and then finding any mismatched calls. Critical Regions counts are explained in the link in this paragraph.

Overall, your best option would be to run Driver Verifier.

1 comment: