Sunday, 2 February 2014

Debugging Stop 0x9F - Power IRPs and PnP Manager

You may have noticed or not have noticed with a few Stop 0x9F, the 0x4 value being the first parameter which indicates that a power IRP has failed to synchronize with the PnP Manager.

The PnP Manager is a subsystem of the I/O Manager, and is used to allow devices to be added or removed without little interaction from the user. The best example to illustrate this point, would the insertion or removal of USB flash drives or any USB connected. The user will not have to install any additional drivers to use the device or configure any settings. The USB flash drive will almost instantaneously be added to the file system, and be able to managed by the user. This is a result of the design of the PnP Manager and the code used within the driver.

The PnP Manager can't be directly interacted with any driver routines. The PnP Manager is both present in Kernel-Mode and User-Mode. The User-Mode version will interact with the Kernel-Mode version.


The PnP Manager is also responsible for maintaining the adding and removal of devices to the Device Tree. Each device within the Device Tree is called a Device Node, and consists of Device Objects which form a Device Stack. We can view the Device Tree in WinDbg with the !devnode extension and the DEVICE_NODE data structure.


You can then view each device node by dumping the Child and Sibling Device Nodes. The whole purpose of this post is not to explain the PnP Manager, but to explain what is the general problem or cause of a Stop 0x9F with subtype being 0x4. The crash is very similar to a Stop 0x9F with the subtype of a 0x3, however, instead of a pending IRP, the problem arises with a thread becoming hung during power transition.


Looking at the call stack, we can generally see that a timer has expired and a Watchdog has been notified of this expiration. A Timer is set with Stop 0x9F's to check the state of any threads or IRPs which are hung or need processing, if the counter is incremented above a certain threshold then the system notifies a Watchdog routine which bugchecks the system. I recommend reading my blog post the Internals of Stop 0x9F here. It explains the timer aspect and Watchdog counter threshold value.

Watchdogs

Let's examine concept of Watchdogs since they are a fundamental aspect for the error reporting with Stop 0x9Fs and therefore shouldn't be dismissed or mistaken for something which isn't important.

For each Device Object is there is a special timer which is used to avoid deadlocking the system, by giving the Device Object the opportunity to be able to cancel any pending I/O operations. The timer is associated with a IO_TIMER data structure. We can use the DEVICE_OBJECT data structure to find the IO_TIMER structure. The Timer field contains a 32-bit pointer to the mentioned structure.

 We can then view the _IO_TIMER data structure, and examine it's fields. 


The TimerList field is a doubly linked list of the timers found with the !timer extension. The TimerRoutine field is function pointer to the driver callback routine which will be called by the I/O Manager every second once the Timer has been started with IoStartTimer.

The DeviceObject field is the associated Device Object which is able to cancel any pending I/O operations. This pointer is usually found from the IO Stack Location of the current IRP.

The Context field indicates the driver context, and thus which driver functions the driver associated with the Device Object is able to call. I wasn't able to find any documentation of the TimerFlag field.

Synchronization PnP and Power IRPs

 The PnP Manager is a subsystem of the larger I/O Manager, and as a result the I/O Manager is able to send PnP IRPs and Power IRPs. The Power Manager is a subsystem of the I/O Manager too. PnP IRPs need to synchronize against Power IRPs, in order to prevent two-state changing PnP IRPs from being present in the same stack at the same time, which allows you to safely call Power Transition or any Power Related IRPs for a particular PnP device. The synchronization is also required to ensure that IRPs aren't called out of order, since some PnP IRPs will need to be called within a certain power state. 


Since I haven't been able to obtain a Kernel Memory Dump for this type of bugcheck, the best extensions and commands would be to use the following: !locks and !irpfind.
The Device Tree can be synchronized with IopDeviceTreeLock which is a type of Spinlock. Use the !locks extension to find any locks related to the Device Tree or I/O Manager, and then view the thread which is holding the lock, this should hopefully give you a insight into why a thread may stuck or a IRP isn't being processed. Use the !irp extension to view the IRP's stack.

Use the !irpfind extension to find any other related PnP IRPs and see if they are being processed.

References:

BSOD DRIVER_POWER_STATE_FAILURE (9f) when running WLK 1.5 CHAOS test 
Which PnP IRPs are State Changing?
Which PnP and Power IRPs are synchronized against each other?







2 comments:

  1. Excellent, I'll refer to this when I run into a *9F with a non blocked-IRP 4th parameter. Have you managed to get a kernel-dump yet? If not, the next time I run into one, I'll request it from the user and get it over to you.

    ReplyDelete
    Replies
    1. I haven't managed to find one yet, they're quite rare so it might be difficult.

      Delete