Friday, 13 December 2013

Virtual to Physical Address Translation (Part 1)

This is going to be one of the most fundamental blog posts for debugging. The importance of virtual to physical memory address translation, will improve your understanding of much of the other aspects of Windows Memory Management. We need to understand PDEs, PTEs, TLB, PFNs and Look Aside Lists.

x86 Kernel Address Space Layout

To begin, lets take a look at the the layout of the Kernel Virtual Address Space, and the difference between x86 and x64. The Kernel Virtual Address Space is divided into sections for different aspects of the operating system.


Looking at the above diagram, we can see that the Kernel Address Space and the User Address Space have been divided equally, this is normal for a x86 system, whereby 4GB of addressable memory is divided into two 2GB parts. However, an exception to this rule, is user processes can be large address space aware.

The address space would then appear to something like the above in the diagram. For a process to be large address space aware, the IMAGE_FILE_LARGE_ADDRESS_AWARE flag must be set in the image header for the executable file of the process.
We can see using the WinDbg !dh extension, that the image header of the NT Kernel Module is able to support addresses beyond the 2GB address space.

Now, let's look at what the Virtual Memory Address Space consists of, and how it different on x64 systems. On x86, the Kernel Address Space runs from FFFFFFF to 80000000, and consists of:

FFFFFFF
  • HAL
  • Crash Dump Drivers
  • Nonpaged Pool System space
  • System PTE
  • Paged Pool System space
  • System Cache
  • System Cache Structures
  • Process Working Set Lists
  •  Page Tables
  • Session Space
  • System Mapped Views
  • Special Pool
  • Initial loading of boot drivers and HAL by the NTLDR 
  80000000

x64 Kernel Address Layout

Since, the address space is much larger on x64 systems, the address space isn't divided into two equal parts like x86, instead memory regions are mapped according to their size.


On x64 systems, 32-bit programs will be able to use 4GB of addressable space, due to the much larger addressable space for x64 (16 EB). However, there has been limitations set to the current addressable space of x64, due to hardware limitations and the lack of need for the amount of memory. These limitations have lead to the address space being limited to 48-bit, which means the other 16-bits have been used to create canonical addresses.


Please note, that non-canonical addresses will result in a access violation exception. Furthermore, it's important to understand that the addressing is based upon the CPU Architecture and the MMU.

The operating system licensing restrictions can also prove to be a limiting factor.

Dynamic Virtual Address Type Allocation and Management

As a result of the Dynamic Virtual Address Space management used for the Kernel Address Space on x86. To achieve, this MiInitializeDynamicVa is called to create the dynamic address ranges for the supported virtual address types listed below, in the enumeration called _MI_SYSTEM_VA_TYPE:



After everything has been loaded and initialized, virtual address space can be dynamically allocated with MiObtainSystemVa and subsequently released with MiReturnSystemVa. Dynamic Virtual Address Management brings many advantages to x86 and x64 systems , such as smaller memory reservation and consumption, since page tables do not have to allocated with unused memory addresses, these memory addresses can be allocated on demand when they are used. Furthermore, memory can be dynamically reserved leading to better memory management.

Using !validatelist, !exchain and !mca

This blog post is going to show a few extensions available in WinDbg, which we can use with our debugging. I'm going to cover !validatelist, !exchain and !mca.

!validatelist

Firstly, let's begin with the !validatelist extension, which is used to test corruption within a doubly linked list, and that each entry points to the next entry and the previous entry. These pointers are called flink and blink.

I've used a entry from the _LINKED_LIST data structure found in the _EPROCESS data structure for demonstration purposes.


The _LINKED_LIST data structure can be seen as follows:


Using the !validatelist extension, the doubly linked list is walked along, or more specifically and technically correct; we transverse the linked list. Here we can see there was no problems with the linked list algorithm.

This is useful extension for debugging Stop 0x19's and checking if linked list data structures aren't corrupt.

!exchain

The !exchain extension is used to list all the exception handlers available within the thread's stack. The frame number is shown for each exception handler, personally I found this extension useful for checking what internal undocumented functions are used for.

!mca

The !mca extension is used to display and gather information about the Machine Check Architecture error reporting mechanism.

We can see each MSR Bank for additional reporting of errors found by the CPU, and which errors were found.

Additional Reading:

Machine Check Architecture
A short description of x86 MCA

Debugging Stop 0x9F - Multiple Completion Status Fields

Usually, you may notice when using !irp in with a Stop 0x9F, that the completion status shows three different fields; sometimes with the addition of the pending flag being set. In this blog post, I'm going to explain what is actually happening and which completion status field is set.

As you can see, there is three different IO Completion Status fields present, so the question is, which one is WinDbg suggesting? These fields are defined depending upon what the driver was going to do with the completed IRP. They are used with the IoSetCompletionRoutine function, which is defined as follows:


These are all BOOL values, and thereby will be either true or false, depending upon the bit values. InvokeOnSuccess and InvokeOnError decided wherever the completion routine defined within the IO_STACK_LOCATION data structure will be called upon if the IRP is completed with a NTSTATUS Success value or NTSTATUS Error value.

The next important data structure is the IO_STACK_LOCATION data structure, which contains the completion routine and the control flags seen with the !irp extension. We need to use the 0xfffffa80116e07d0 address, since this is the address of the data structure for the current IRP stack location.

The flags field is used to check wherever the SL_PENDING_RETURNED flag has been set, and the IRP is pending.

The Control flags field is divided into two parts e and 0. Let's start with the numeric value first. There are three possible values: 0, 1 and 2.

0 = Nothing
1 = Pending
2 = Error

The second part corresponds to our Success, Error and Cancel fields as seen before. We need to convert e into numerical form, which is always 14. You then need to convert this 14 or 0xe into a binary format which is 1110. I used a free online binary converter to do this.


The 0 corresponding to the Pending field, and the other three values correspond to the BOOL values seen with the IoSetCompletionRoutine.


References:

Breaking down the "Cl" in !irp






Thursday, 12 December 2013

Understanding MDLs (Memory Descriptor Lists) [Updated Version]

Before, my blog post about MDLs was a simple link back to Vir Gnarus' tutorial on MDLs. This time I'm going to write a proper blog post/article about MDLs. I'll try and expand on the previous tutorial on Sysnative, but will also add some of the original aspects to this blog post.

Please note, I'll be using the dump file provided on the tutorial on Sysnative.

We should all understand the differences between virtual memory and physical memory. Virtual Memory is always contiguous, whereas, physical memory is discontigous it tends to be more random. This brings in the purpose of a MDL. MDLs are primarily used for I/O operations, whereby, a virtual memory data buffer is locked against a physical address range, the MDL is used to describe the mapping and association between the buffer and the physical address range.

The specific type of I/O operation which this is used for is Direct I/O. Direct I/O is used to deliver IRPs without the need for the use of the system buffer. For Direct I/O, the user's virtual address buffer of the program is locked into physical memory, thus making the virtual memory buffer non-paged. Once the IRP has finished being processed, through the necessary device objects and associated driver objects in the IRP' stack, then the I/O Manager will unlock the virtual memory buffer, and then deallocate and free the MDL. You may notice issues and bugchecks of drivers unlocking and locking MDLs too many times.Just to add, that the virtual memory buffer is question, can be User Mode or Kernel Mode virtual memory.

MDLs may also be used in DMA (Direct Memory Access) to describe the referenced physical memory addresses. DMA is used to access RAM without the need of the CPU. There is a number of different methods and issues surrounding this, which will most likely formulate into a blog post.



Looking at the structure of the IRP, in the Fixed Part section, we can see the MDL Address being used. We can also view the structure of an IRP within WinDbg.


I won't go into the details of the IRP data structure, since it's not relevant to the topic of this post. It should be fully documented within the WDK. I'll explain the MDL field since it's relevant to this topic.

When the IRP, has been sent to the driver (assuming access checks are okay), then the buffer size is checked, if the buffer size is too large for the the proportion of memory, then the IRP is completed by the I/O Manager with an error status and the operation is not continued. On the other hand, if the access checks are fine and the data buffer is of the correct size, then the data buffer will be locked for the entire lifetime of the IRP.

Now, let's look at the general structure of a MDL. We can view it in Windbg.


The most interesting sections will be the Flags section and the Next section. The Next section is used to chain a number of MDLs together for one virtual data buffer which isn't contiguous.

The most vital part of this data structure is the MdlFlags field which is defined within the wdk.h header file.


These flags are useful for debugging, since they will define how a MDL is being used, and what APIs should be used with the MDL.

The StartVa field indicates the starting page aligned for the virtual address range. (MDL_MAPPED_TO_SYSTEM_VA and/or MDL_SOURCE_IS_NONPAGED_POOL flags must be set; VA usually means Virtual Address).

The ByteCount field indicates the entire length of the virtual address range mapped by the MDL.

ByteOffSet is the first address offset for the first page of the MDL.

The Process field contains a pointer to the _EPROCESS data structure of the process object, whose virtual address space is mapped by the MDL.

The Size field contains the size of MDL data structure and a array of the PFNs used for the physical page.

Additional Reading:

Master of the Obvious -- MDLs are Lists that Describe Memory 

More on MDLs - MDLs Are Opaque 

Understanding Data Alignment and Alignment Exceptions

The following topic, is much more related to CPUs rather than the operating system. CPUs are able to work much faster when data has been aligned to a size of 2 bytes, 4 bytes, 8 bytes, 16 bytes and 32 bytes. This values are called memory access granularity. Granularity refers to how subdivided a memory address is, and if the data can be evenly divided by it's associated address.


If a CPU accesses a unaligned piece of data, and the processor is x86, then the processor will simply (although negative for performance) access the data into aligned chunks for the unaligned data. This will cause more read memory accesses.

On the other hand, with x64 processors, two options can happen and this depending upon the setting with the EFLAGS register. If the AC (Alignment Check) flag is set to 0, then the above is true like the x86, otherwise if set to 1, an exception is thrown that the CPU is accessing misaligned data. This exception interrupts with INT 17H. Remember the H, otherwise you'll be confused with the interrupt used with the parallel printer port.





 Looking at the above image, we can see that Address 0 has been correctly aligned to a 4 byte boundary, however, Address 1 is clearly not aligned to this boundary leading to unaligned memory access, and thus creating an extra read.

We have looked at data alignment, but stacks and data structures are also aligned, as well as, instructions. Firstly, let's start with stack alignment,  on x86 processors, the stack frame is always aligned to a four byte boundary, whereas, on a x64 processor, the stack frame is always aligned to 16 byte boundary. Some x86 processors, may need to use 8-byte or 16-byte alignment boundaries.



For data structures, the structure is aligned according to the largest member data type within the structure. For example, if a data structure contained two int values and then one double value, the data structure will be aligned according to the size of the double value. This is inter-structure alignment. For intra-structure alignment, the members are aligned to their own boundaries, so 4-bytes and then one 8 byte (using the example stated), but a padding of bytes is included to align the members.

If you do encounter misaligned addresses in your program, then Windows will throw the EXCEPTION_DATATYPE_MISALIGNMENT. For BSOD debugging, you will most likely encounter an access violation exception code, or the the failure bucket ID will say something along the lines of x64 unaligned IP.

Additional Reading:




Data alignment: Straighten up and fly right

Tuesday, 10 December 2013

Investigating Pool with Pool Data Structures

Pool is such as fundamental resource, and something which that can cause a few bugchecks if not handled properly. Here I'm going to explain some of the data structures used when looking at pool.

Firstly, we need to understand the different types of pool, and where they are used, the pool types are stored in a data structure of a enumeration called _POOL_TYPE.


The only values your going to need to be concerned with are Paged and Non-Paged Pool. As you most likely and should know, Paged Pool can be paged out onto the hard-drive and is susceptible to page faults, whereas, Non-Paged Pool must be resident in physical memory. Paged Pool can only be accessed at IRQL Level 2 or below, whereas, Non-Paged Pool is accessible at all IRQL Levels.

The other members of the enumeration are for internal use by Microsoft developers.

The _POOL_TYPE enumeration is actually a substructure of a larger data structure called _POOL_DESCRIPTOR.



 ListHeads is 512 element array of doubly linked free lists, which is the number of free allocation blocks, which can be used by drivers. Each block is associated with a _POOL_HEADER data structure.

TotalPages is the amount of pages of that pool type.

RunningAllocs and RunningDeAllocs is the amount of pages being allocated and deallocated.

PendingFrees are pages waiting to be freed.

Each descriptor is stored in an array called nt!PoolVector on single processor systems, however, on multiprocessor system it is stored in an array called nt!ExpNonPagedPoolDescriptor and nt!ExpPagedPoolDescriptor.

 Looking at the _POOL_HEADER data structure, I'll explain some of the fields, I haven't explained before, and some seen in previous blog posts:

  
Previous Size: The size of the previous allocation in the doubly linked list.

Pool Index: The index into the pool descriptor array.

Block Size: The size of the current allocation.

Pool Type: 0 means that the allocation is free, and if a allocation is present, then the hexadecimal number can be decrypted into PoolType and 2.

Pool Tag: The owner of the pool allocation.

The above example, is for a x86 system, however, there is an extra field present on x64 systems, which is the ProcessBilled field, this is a pointer to the process object (_EPROCESS) of the process which has been charged for the pool allocation, and is used for Quota Management. The ExAllocatePoolWithQuotaTag can be used here.


Depending upon the Pool Type, Quota Management is how amount Paged Pool and Non-Paged Pool that process is able to use. We can see these values with the !process extension.


The amount of Paged Pool and Non-Paged Pool a process can use, can be found in the Registry at HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management



Looking at Virtual Memory - !vm

This is going to be brief overview at the !vm extension, which provides Virtual Memory statistics. I've added the 0x20 flag, since it also provides information about the Kernel's own memory usage.


The ResAvail (Resident Available Pages) is the number of virtual memory pages currently in physical memory, that would be available if every process only consumed it's working set minimum. The Available Pages is the combination of virtual and physical memory available to use.

The working set of a process can be seen with the !process extension:



The current values are the default working minimum and maximum values for a process, however, these can be ignored if there is enough memory available to use, and the if the working set hard limits are altered.


The Committed Pages is the number of pages in a process' address space which contain code and data, and are guaranteed to stay in some form of memory, either it be RAM or on the hard-drive. The Commit Limit is the how many pages can be resident within memory.

The two areas highlighted with the !vm output are good areas to look for pool leaks.

Looking at the Kernel consumption, we can see the current consumption, the highest memory consumption and any recent allocation failures.