BSODTutorials: October 2013

Thursday, 31 October 2013

Debugging Stop 0x4A - User-Mode and IRQL Levels

Okay, here's another debugging example which has quite a simple methodology, a driver has returned to User-Mode while the IRQL Level was above PASSIVE_LEVEL or Level 0.

The first parameter indicates the IRQL Level of the processor, in which in this case is Level 2 or DISPATCH_LEVEL. The first parameter indicates the address of the system call in which the driver has returned from.

The main point to remember is that all threads which run in User-Mode run at the IRQL Level of 0, this is ensure that no User-Mode thread has a higher interrupt priority than any Kernel-Mode threads. In case, you wondering which cases a User-Mode application may need to call into Kernel-Mode, a good listing of examples can be found here - User-Mode Interactions: Guidelines for Kernel-Mode Drivers

The stack does not reveal much, apart from this:

The stack simply reveals a function in User-Mode calling the nt!KiServiceExit2 routine, which is a internal Kernel routine.

The use of dd command in a stack can be found here - [Advanced] Principles of debugging

The driver which seems to be causing the problem is related to the Bitdefender program.

I understand, this was a very short and brief tutorial, but to be honest there isn't much behind this bugcheck apart from what I have already explained.

Happy Halloween!

Wednesday, 30 October 2013

Process Explorer - Looking at Open Handles

This is going to be my first little post on about how to use some of the features of Process Explorer, which is a very useful tool to have if you understand how to use it properly.

If you understand how objects and handles work, then you may skip this paragraph and read the rest of the blog post, however, if you wish to gain a brief understanding on how objects and handles work then please continue reading. I will not be explaining how objects work completely, since this is quite a lengthy topic, although, I may explain objects in more detail in the future. Objects are basically System Resources, and handled by the Object Manager, we can also see with the WinObj Sysinternals Tool. Each object also maintains a handle count, which is number of open handles or active references to that object from processes. A object can only be destroyed, once all the handles to that object have been closed.

Open WinObj, and then click File and then Run As Administrator. Select the Object Types folder, and you will see all the available Object Types handled by the Object Manager.

We can look further into the details of a certain object by looking at some data structures and extensions in WinDbg.

Note: Please be aware that these commands and extensions may not be able to gather the desired information from a Minidump, and unfortunately since I've only got a Minidump to work with, then a Minidump will have to be used.

The Object Header data structure maintains information about the object, and points to the type object which maintains information which is common to that object type. A Object Header maintains sub headers which are specific to that object. I will not explain all the fields here, since they are not relevant to the context of this post, the three which I will explain are PointerCount, HandleCount and Flags.

HandleCount maintains the number of open handles to that object, the PointerCount maintains the number of references to that object (this includes any open handles). You may be wondering why there is also a PointerCount field; this is included since Kernel-Mode objects can reference objects with pointers instead of handles.

The Flags field maintains any attributes for that object. The Flags field in stored in a structure called Object Attributes, which we can view with the !obja extension.

The !obja extension displays the attributes or flags of a object which is stored by the Object Manager. It takes the hexadecimal address of the object, in this case it's a process object.

Let's examine the two flags for the process object:

OBJ_EXCLUSIVE: This specifies that the object can only be used by the process which created it.

OBJ_CASE_INSENSITIVE: Specifies that lookups for the object in the namespace should be case insensitive.

This brings me to the point, that handles are used primarily since the Object Manager can skip the name lookup and find the object directly. When a object is created by a process by name, it is given a handle.

Going back to our original discussion, open Process Explorer, and then select the View menu button and then select Show Lower Pane (or CTRL + L). Select the Lower Pane View option to show open handles to any objects being used by a selected process.

In this example, I've chosen Firefox 25, we can use all the associated objects being used by that process, this includes all the threads running under that process. It's important to remember that a process object is more of a shell, in which the threads run within.

We can right-click the File object, and then view it's properties or even close the handle to that object which I would strongly advise against.

Sunday, 27 October 2013

Linked Lists - Flink and Blink

Okay, linked lists are used often in Windows and are usually part of larger data structures. They are typically seen with the Windows Debugger (WinDbg) with the name of _LIST_ENTRY. This indicates a linked list data structure, or more specifically the list head of a linked list which contains information such as the Flink and Blink pointers.

Here is some example output using WinDbg:

The data types being used here are 32-bit pointers, which is defined as follows:

This is stored in the BaseTsd.h header file.

For non-programmers or those who are simply interested, the #define preprocessor statement is used to substitute a identifier for something which shorter or easier to read.

Here's a very simple example which I wrote myself:

Getting back to the point, I started a discussion a while ago about Flink and Blink addresses, and what they meant on Sysnative - Understanding Flink and Blink Free-Lists (Stop 0x19)

If you ever see, the parameters of the Flink and Blink Free List, and the pool freelist being corrupt, then this usually because the linked list isn't valid anymore and doesn't point to the addresses it's supposed to. The pool allocation should be free and available to use by device drivers, but validation of the linked list indicates that that pool allocation isn't free, as a result of some buffer overrun or underrun.

The pool freelist is group of free pool allocations linked together in a chain.

In concept, a doubly linked list is shown as this:

We can view doubly linked lists with the !dflink and !dblink extensions, along with the use of the dt -l command. We can also use the !list extension.

Windows knows if a linked list is empty, since the FLINK pointer will point to the List Head address. The IsListEmpty API can be used with this, and return a Boolean value, therefore if the list is empty, it will return true. Boolean expressions are either true or false.

Windows and device drivers can add entries of data to the linked list, by using the InsertHeadList and InsertTailList API. The name of the routine is quite self explanatory, InsertHeadList adds an entry to the beginning of the linked list, whereas, InsertTailList does the opposite. In terms of code, the routine would be written as so:

The routine takes two parameters, which are both pointers; ListHead is a pointer to the List Head of the linked list and Entry is a pointer to the entry of the linked list which is going to be inserted. The routine is void and does not return anything. The same API can used to insert entries into middle of linked lists, since the previous entry will be used as a list head.

More Information - Kernel-Mode Basics: Linked Lists

We have briefly discussed the API used with Linked Lists, and shown where Linked Lists are present in data structures formatted by WinDbg. Stop 0x19s are a common example of linked lists being corrupted; ensure you check the parameters first.

I would like to finally add, that arrays and linked lists have similar purposes of storing similar data which is easily accessible, although, arrays and linked lists are data structures which should be chosen for the appropriate task. A good discussion about the differences between linked lists and arrays is on Stack Overflow.

If you interested, you can learn or see how a linked list works in programming by reading this article on CodeProject.

Here is another simple program which demonstrates the use of a pointer to access a element of a array, and then print the address of that element to the user.

A character pointer named p is created, as well as, a empty character array of 50 elements called string. We then assign the pointer, the value and address of the first array element. The program then prints the memory address of the first element to the screen.

Wednesday, 23 October 2013

Debugging Stop 0x124 - Calculating Clockspeed (Without !sysinfo cpuspeed)

We all know that Stop 0x124 contain very little practical information to work with, the stack consists of WHEA reporting routines and many commands have no significance to a Stop 0x124.

The first thing to look at with a Stop 0x124 is the clock speed of the processor, generally a overclocked processor will mean that the user has most likely overclocked their GPU or RAM, and also changed the voltage settings. However, the !sysinfo cpuspeed extension does not always work like below:

This is quite annoying, but there are other ways to view the clockspeed of a processor, we can see the !prcb extension to view the Processor Control Block, which is a private kernel data structure used to store thread scheduling information; DPC queue, detailed CPU vendor information and cache sizes etc.

Using the !prcb extension doesn't provide much information, but using the address fffff780ffff0000 with the _KPRCB data structure will provide some very detailed information. Please be aware that not specifying the processor number with !prcb extension will default to the context of the current processor.

This is not the entire data structure, you can enter the command yourself, if you wish to view the entire data structure, but for the purposes of this blog post the MHz field is what we are interested in the most.

The hexadecimal value of 0xa21 doesn't give much information and is practically useless. Using the ? (evaluate expression) command we can convert this into something readable.

We can see that the processor is running at 2.6 GHz, the .formats command will give similar information:

Interrupt Dispatch Table - !idt

In a previous blog post, I explained some of the exception codes which are stored within a data structure called the Interrupt Dispatch Table, we can be viewed with WinDbg using the !idt extension. Here I would like to breifly explain how the Interrupt Dispatch Table works.

We can gain further information from each interrupt, when using the dt nt!_KINTERRUPT command, which will give you the following prototype:

Drivers will use the IoConnectInterruptEx API, to provide a pointer to the above data structure, when registering a ISR for that device.

Hardware interrupts are handled by a interrupt controller which then interrupts the CPU, and the CPU then reads the IRQ to match the request to the appropriate interrupt number. Most CPUs use a APIC interrupt controller, rather than the older PIC controller. You can attempt to use the !pic and !apic extensions to see which one you are using; only one extension will work. Furthermore, interrupts are serviced by a routine called a Interrupt Service Routine (ISR), whereas, a exception is serviced by a exception handler.

Each interrupt is given a IRQL (Interrupt Request Priority Level), as this is generally a software related interrupt concept (APCs and DPCs), then IRQs from hardware interrupts have to be mapped to the appropriate IRQL level. We can view the IRQL level of a processor with !irql.

On x86 systems, the IRQL levels range from 0 to 31, whereas, on x64 systems this is 0 to 15.

Monday, 21 October 2013

Kernel Data Structures - dt nt!_* and dt nt!_ -r

It's Monday morning, and my week is going to be very busy, so I thought I'll write a small and simple blog post today. Kernel Data Structures contain lots of useful information when debugging, but it's understanding which data structures correspond to what situation and dump file and how to open these data structures.

I highly recommend checking this website out for Kernel Data Structures and some debugging examples, it's such a excellent resource - Code Machine Articles - Kernel Data Structures

Some of the data structures which are available in WinDbg are available on the Windows Driver Development Kit documentation.

Okay, the dt nt!_* command lists all the available data structure prototypes, which WinDbg can open and format for you to understand, I find this especially useful for learning which data structures are relevant and what additional information they may contain.

The above screenshot is only partially the output of the command, but due to size limitations of the Snipping Tool, it's what I could manage to upload the most of.

Let's look into the _POOL_BLOCK_HEAD data structure, which contains two substructures which are _POOL_HEADER and _LIST_ENTRY. We need to enter the following command: dt nt!_POOL_BLOCK_HEAD.

We could open the two structures individually with two separate commands using the same method, however, the better method is to use the -r switch, which will also open all substructures within the specified structure.

The first block is the _POOL_HEADER data structure, and when the output is aligned to the right, this is the beginning of the _LIST_ENTRY data structure.

Saturday, 19 October 2013

BSODs and Cracked Games - It's the Game

I think this may be the one example of the a user-mode program, actually being the sole cause of a BSOD. While it's technically impossible for a user-mode program to cause a BSOD directly, it can be possible for a user-mode program to cause a kernel-mode driver to act in a manner which would cause a BSOD. In this case, the a user had downloaded several cracked games (which contained infected files), I even gave a warning that cracked games tend to be malware, and it looked like I was exactly right.

So, lets begin: one of the games had a application error which I opened with Visual Studio, it indicated that a thread attempted to access a virtual address which it didn't have access to. It's most likely kernel address space in my opinion, I may check this later.

As soon as, I noticed this it gave me assumptions that the game(s) are most likely malware, since I already established with the user they were cracked, and they were in full acknowledgment of this too.

Looking at the MBAM Log (Malwarebytes), I found some interesting entries:

C:\Program Files (x86)\Square Enix\Sleeping Dogs\buddha.dll (Malware.Gen.SKR)

Another game indicated this:

D:\Users\USER\Downloads\SAINTS ROW 4 CRACK ONLY-RELOADED.rar (VirTool.Obfuscator)

D:\Users\USER\Downloads\SAINTS ROW 4 CRACK ONLY-RELOADED\Crack\steam_api.dll (VirTool.Obfuscator)

D:\Users\USER\Downloads\Saints Row IV Commander In Chief Edition-FULL UNLOCKED\Saints Row IV\steam_api.dll (VirTool.Obfuscator)

VirTool.Obfuscator is a Windows Virus which hides itself as a certain file, in this case a .DLL for Saints Row IV, to perform malicious actions.

Debugging Stop 0xCA - PNP_DETECTED_FATAL_ERROR

A Stop 0xCA usually indicates something is wrong with the PnP Manager or a driver/device which has a device node within the PnP Manager's Device Tree. To keep this blog post manageable (I've read through three different pages of Windows Driver documentation), I'll add the relevant links to those pages if I can.

The significant things to look at in this dump file, are the call stack and the driver object associated with the PDO (Physical Device Object). We can there is something wrong with the PCI bus or a device connected to that bus by just opening the dump file:

Let's get a overview of the nature of the dump file with:

The most important page you will want to start with is this one - IRP_MN_QUERY_DEVICE_RELATIONS

We can see a driver has deleted a PDO which has already been deleted, I found the wording of the error to be quite difficult to be honest. I decided to look into the supported dispatch routines (IRPs) which the driver object could handle with the !drvobj extension:

We can see that the Major Function Code IRP_MJ_PNP is supported by device object, which has a Minor Function Code of the driver documentation which I created a link for, therefore there isn't any problem with the device being able to handle the IRP. I brought up this point, this that IRP is used to query device relations and handle RemovalRelations requests.

A BusRelations request may have also been sent to query the child devices connected to the PCI Bus, we could have caused the system to bugcheck with present Stop Code, as a result of this from Microsoft:

"Warning A device object cannot be passed to any routine that takes a PDO as an argument until the PnP manager creates a device node (devnode) for that object. (If the driver does pass a device object, the system will bug check with Bug Check 0xCA: PNP_DETECTED_FATAL_ERROR.) The PnP manager creates the devnode in response to the IRP_MN_QUERY_DEVICE_RELATIONS request. The driver can safely assume that the PDO's devnode has been created when it receives an IRP_MN_QUERY_RESOURCE_REQUIREMENTS request."

We can view the call stack from a Stop 0xE4, and get supported information about our theory.

To summarise, there seems to be a problem with the PnP Manager or at least the drivers for the device, and how they are handling certain IRPs. The IRP seems most likely related to the _MJ_PNP Major Function code; IRP_MN_QUERY_DEVICE_RELATIONS seems to be the specific Minor Function code where the problem might have happened. I'm also guessing that a routine was used which took a device object as a parameter, before the a device node was created within the device tree by the PnP Manager for the associated device object.

Circular Kernel Context Logger Error: 0xC0000188 - MaxFileSize Registry Key

Okay, I understand this blog post, isn't exactly related to BSODs but since this BSOD Analysts since to have to debug other errors too then I would I would write a blog post about this error which isn't exactly an error as such.

As a result, of some of the misinformation I've seen on other forums about this issue, for example editing the registry DWORD value for the MaxFileSize key, since they believed it was a problem with Windows not being to able to save files above 0. I thought it would be necessary to explain what this error is related to.

Firstly, open your registry, by clicking Start and then entering regedit. Accept the UAC prompt (if shown), and then navigate to the following registry key - HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WMI\Autologger\Circular Kernel Context Logger\MaxFileSize

You should see a set of key similar to the above picture.

A DWORD is a Windows API data type, and is a unsigned integer for those who know or are interested in programming.

The MaxFileSize DWORD key is used to set the maximum file size of the log file which is created by Autologger, 0 means unlimited file size, but it also seems to depend upon the type log mode your using.

More Information - Configuring and Starting an AutoLogger Session

Friday, 18 October 2013

Microsoft Symbol Server Fix and My Absence (Sick)

Microsoft should have hopefully updated their symbol servers on Wednesday 16th October, although, some BSOD debuggers are still reporting problems with the symbols. I know, some dump files do have symbol errors occasionally, but if your getting a large amount then I would contact Microsoft.

I've also been very ill for the last few days due to a very contagious stomach bug, so I haven't been presence at all in the online world for the last few days.

Monday, 14 October 2013

Memory Segmentation

In one my previous blog posts, I wrote about Segmentation Faults, although, I didn't give much explanation to what a segment was and how it is used within the translation process.

Memory segmentation is the method of dividing the computer's primary memory into segments, each segment will contain two parts which I'll attempt to explain later. Memory segmentation is one of the protection mechanisms which can be used to restrict access to certain areas of memory.

It's important to understand, that the CPU can only understand machine instructions (binary), and that higher level languages are simply used for programmers to be able to write large and complex programs without much difficulty. The language above machine code is called Assembly, which enables programmers to create very fast and small programs to perform very specific tasks.

It's important to remember that programs can only use virtual memory (logical addresses), which have to be translated into physical addresses which the CPU will be able to use. You also may be wondering what is a linear address? A linear address is the logical address and the base of the segment. There are a few different segments: CS (Program Code); SS (Stack); DS, ES (Data).

With this blog post, you'll hopefully also understand what role the GDT (Global Descriptor Table) and LDT (Local Descriptor Table) have with segmentation and the operating system.

These tables are only present in Protected Mode, which enabled programs to use things such as virtual memory and paging. When the system is first booted, the CPU is running in Real Mode, which simply uses physical memory addresses and allows direct access to hardware.

In Protected Mode, each segment contains a segment descriptor, which is stored within the GDT and LDT, a segment descriptor contains information about the descriptor privilege level (CPU Rings or more commonly Kernel Mode and User Mode), and the base and limit addresses of the segment which describe the beginning and the end of the segment.

You may want to also read about Memory Models, this a good thread to start with - Understanding Flat Memory Model and Segmented Memory Model.

Symbol Errors - Kernel Dump Files

Okay, the 8th of October saw many security updates delivered through Windows Update onto our systems. As a result of this, the timestamp for the NT Kernel may changed, and therefore has left the symbols currently on the Microsoft Server outdated, leading to the symbol errors we have been seeing.

If I've read correctly, users' dump files who have not yet updated will show no symbol errors, and users who have updated will show symbol errors; this is not to say don't install the latest security updates, please do so if you haven't, but for BSOD Analysts I believe Microsoft should be updating their symbol server soon.

Source - http://www.sysnative.com/forums/bsod-processing-apps-download-|-information-|-discussions/7359-ntoskrnl-exe-nt-kernel-symbol-errors-windbg.html

Thursday, 10 October 2013

Debugging Stop 0x109 - Kernel Patch Protection

Stop 0x109's are not that frequent, but they do not provide much information to work with. Stop 0x109's are generally caused by driver modifying kernel code (Kernel Patch Protection detects this) and kernel code held within RAM has become corrupt (failing RAM modules).

The purpose of this blog post, is mostly going to be explaining a little of Kernel Patch Protection and regions of the kernel which have been modified and are displayed within the bugcheck parameters.

With a Stop 0x109 bugcheck, there is four different parameters, the only parameter in which you'll want to view is the fourth parameter which shows the corrupted region. The other parameters are undocumented and reserved for Microsoft use.

The three parameters which I'll attempt to explain are: MSR, GDT and IDT.

MSRs (Machine State Registers): Handlers for special instructions used for system calls. More Information - MSRs

IDT (Interrupt Descriptor Table): A table which contains a list of ISRs and which interrupt number (IRQ) they belong to. More Information - IDT

GDT (Global Descriptor Table): A table which contains entries for memory segments. Segments are small regions of protected memory which are used to load CPU register instructions. More Information - GDT and More Information - Memory Translation and Segmentation

So, now you know, about some of the more technical regions of the operating system, we should start speaking about the main topic of the blog post, which is in fact Kernel Patch Protection and how it results in a Stop 0x109.

Kernel Patch Protection was introduced onto x64 operating systems, which were running Windows XP and Windows Server 2003 Service Pack 1. Kernel Patch Protection stops the modification of the kernel which would reduce stability and security. Any modification of the kernel would lead to a Stop 0x109 bugcheck. Here is a list of reasons on what the system would bugcheck or reboot upon:

Modifying System Files (HAL.dll and ndis.sys): Changing code within ndis.sys could lead to malicious activity such as opening network ports without the user knowing.

Global Descriptor Table: Enabling code running within User Mode to execute operations within Kernel Mode. You may see some CPU or operating system documentation/books refer to Ring Levels (Ring 0, Ring 2, Ring 3), which is old terminology, and refers to Kernel Mode and User Mode.

Interrupt Descriptor Table: Intercept and hook onto I/Os at interrupt level and page faults to hide contents of memory.

MSRs: On x64 CPUs, the LSTAR (Long System Target Address Register) register is used to control the value which contains the address of the routine which is intended to have control within Kernel Mode.

System Service Descriptor Table (SSDT): A table of array of pointers which point to system call handlers; rootkits could modify the I/O of calls from user mode.

Kernel Stacks: A malicous driver could allocate memory as a kernel stack for a certain thread, and then redirect calls and parameters.

Object Manager: Modify objects (processes, threads and file objects etc.) to alter system behavior through Direct Kernel Object Modification.

Monday, 7 October 2013

Debugging Stop 0x24 - SPTD.sys and Filter Drivers

STPD.sys is a driver which is part of Daemon Tools and the Alcohol products, this driver is well-known to cause problems and should be removed. In this example, I wanted to demonstrate the usefulness of searching Windows Driver API documentation and look at the types of drivers which are known to cause Stop 0x24 bugchecks.

The second parameter usually refers to the exception record and the third parameter usually refers to the context record, you can use the .exr and .cxr debugger commands to gain information from these parameters.

We can see that the instruction which caused the access violation was nt!FsRtlLookupPerFileObjectContext which is documented within the Windows Driver API.

FsRtlLookupPerFileObjectContext, is used by filter drivers to receive the context of a previous file object; a file object can refer to an actual file or physical hard disk. The IRQL level is fine here, so no functions were being called at the incorrect IRQL level.

From another bugcheck, it was revealed that SPTD.sys was causing problems, and then was removed, this has ended the Stop 0x24 bugchecks, although, the overall issue still continues and seems to be related to drivers.

Programs which interact with the file system drivers and the storage stack (especially anti-virus programs) tend to be the cause for Stop 0x24 related bugchecks.

Full Thread is here - http://www.sevenforums.com/bsod-help-support/306878-3-bsod-row-new-ssd-ram-video-card.html

Wednesday, 2 October 2013

Windows API Function Prefixes

Here's the list of prefixes for the Windows API function calls you may notice within a call stack. Please also be aware that i means Internal and p means private.

Alpc = Advanced Local Inter-Process Communication

Cc = Common Cache

Cm = Configuration Manager

Dbgk = Debugging Framework for User-Mode

Em = Errata Manager

Etw = Event Tracing for Windows

Ex = Executive support routines

FsRtl = File System driver Run-Time Library

Hal = Hardware Abstraction Layer

Hvl = Hyper visor Library

Io = I/O Manager

Kd = Kernel Debugger

Ke = Kernel

Lsa = Local Security Authority

Mm = Memory Manager

Nt = NT System Services

Ob = Object Manager

Pf = Prefetcher

Po = Power Manager

Pp = PnP Manager

Ps = Process Support

Rtl = Run-time Library

Se = Security

Tm = Transaction Manager

Vf = Verifier (Driver Verifier)

Whea = Windows Hardware Error Architecture

Wmi = Windows Management Instrumentation

Wdi = Windows Diagnostic Infrastructure

Zw = Similar to NT, but sets access mode to Kernel, which in turn eliminates any parameter validation.

Tuesday, 1 October 2013

Debugging Stop 0x124 - CPU Mnemonics

Okay, we all know that Stop 0x124's are very generic and irritating bugchecks since they don't provide much information at all to be honest.

Although, this can be made easier by following reading the error mnemonics within the CPU documentation, which will provide further insight into how the error was caused. I actually learned this in I think it may have been one of Vir's quotes in one of YoYo's posts, so thanks for providing information on where to find the documentation.

You will want to download the .PDF file of Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3: System Programming Guide, and then turn to page 2352. Here you will find the error mnemonics for the type of error, remember we can find the type of error and then decrypt it's meaning by using the !errrec extension with the second parameter of the bugcheck.

Due to size limitations in which the Snipping Tool can expand to, I've taken a screenshot of the relevant part of the output in which the !errrec extension reads from the WHEA_ERROR_RECORD data structure.

Okay, so the error is related to a Bus Error, which is documented within the CPU Developer's Manual, each type of error has a table of mnemonics associated with it.

So, looking at the above error, we can see that the error originated from the Level 0 Cache, and the error was sourced by from the processor itself (the CPU raised the flag for the error, hence the Machine Check Exception in the first parameter). The error was a generic no timeout error, occurring within the processor number 0 and memory bank 0. The M indicates that something was accessing the data stored within the cache when the error happened.