Monday 2 December 2013

Debugging Stop 0x124 - Using !whea

A new month, a new post and a old topic. It's back to the Stop 0x124 once again, and this time I'm going to explain the !whea extension, which outputs the high level structure of the WHEA architecture.

You'll need at least a Kernel Memory dump to use this extension.

Take into account the current error record address for the WHEA_ERROR_RECORD structure, and the error source notified. We can actucally view the data structure with the dt nt!_(name) command, I've added the -r switch to dump all the substructures too; here is a partial output:

Here is the !whea output.

I've highlighted the address of the current WHEA_ERROR_RECORD present in our dump file, as the main error record for the crash. I've highlighted the source of the error, and the type of hardware platform which the error record falls under. We can see that one error has caused a bugcheck, and the rest are corrected errors. The rest of the !whea output is below:

Let's explain what each of these fields mean. Firstly, the Type field contains a enumeration called WHEA_ERROR_SOURCE_TYPE. It defines all the error sources which the hardware is able to report. It's general structure is here:

These values will take the value of parameter 1 in the Stop 0x124 bugcheck. Some of the acronyms you already be familiar with, but I'll list and explain these fields nonetheless. 

  • MCE (0) = Machine Check Exception
  • CMC (1) = Corrected Machine Check
  • CPE (2) = Corrected Platform Error
  • NMI (3) = Non Maskable Interrupt
  •  PCIe (4) = PCI Express
  • Generic (5) = Unknown Error
  • INIT (6) = Itanium INIT error
  •  BOOT (7) = Boot Error
  • SCI (8) = Service Control Interrupt
  • IPFMCA (A) = Itanium Machine Check Exception
  • IPFCMC (B) = Itanium Corrected Machine Check 
The Error Count field shows the amount of uncorrectable errors which have lead to a bugcheck, and the Record Count field shows the number of Error Records under that particular error source.

The fields shown in the red box belong to the WHEA_ERROR_SOURCE_DESCRIPTOR structure. It's general structure can be seen here:

The Length and Version fields aren't really important here. The Length field indicates the size of the structure in bytes, and the Version field shows the version of structure.

The Type field is the WHEA_ERROR_SOURCE_TYPE enumeration.

The State field shows the WHEA_ERROR_SOURCE_STATE enumeration, which defines the state of the error source. The structure shows the runtime states of the error source, the error source has stopped handling and processing information or it's started.

MaxRawDataLength is the amount of data which should stored in the error packet, for error source information and any additional information provided by the PSHED plug in, to give specific troubleshooting information developed by the hardware vendor.
NumRecordsToPreallocate, this is quite self explantory, and defines how many error records should be preallocated for the error source.

MaxSectionsPerRecord, this the maximum number of sections to be provided within the error record.

ErrorSourceID and PlatformErrorSourceID are unique identifiers to the error source on the system where the error has happened.

Flags are bitwise OR'ed to show additional information, the possible three flags are as follows:
  • WHEA_ERROR_SOURCE_FLAG_DEFAULTSOURCE indicates the error source is the default error source for the hardware platform in which the error source notified.
  • WHEA_ERROR_SOURCE_FLAG_FIRMWAREFIRST, this shows that firmware processed the error condition before control was handed to the operating system.
  • WHEA_ERROR_SOURCE_FLAG_GLOBAL shows any settings applied to one error source should apply to all error sources of the same type.
Looking at our example, we can see that all the x86/x64 supported error sources provided by querying the PSHED were provided. 

Just to add, to obtain the error packet address and use the !errpkt extension, you need to dump the error record, and the error packet should be in one of the record sections, but unfortunately I've never came across a dump file like this, so I'll post the WinDbg documentation example here.

1 comment: