Friday 7 February 2014

Thread Local Storage Slots

This is going to be mixture of programming and Windows Internals with some use of WinDbg, so it looks like it going to be Programming, Windows Internals and Debugging if we use the categories shown in my blog title. The idea of Thread Storage Slots is simple and therefore you should remember and understand it pretty quickly.

All threads running within a particular process address space are able to access all the memory addresses stored within that process address space. This in fact can cause some problems for some people, since some variables and classes are best to kept private to one specific thread. This brings in the concept and introduction of thread local storage slots.

All static and global variables can be accessed by all the threads within the same process address space, and will share the same fixed memory address in the global scope of the address space of the process. On the other hand, with locally defined variables these variables are local the stack of the thread, and will have differing memory addresses. The concept of thread local storage can be used to implement the advantage of global variable's fixed memory location, and then incorporate that into the benefit of having a variable which is private to that of a particular thread.

TLS is part of the C++ 11 Standard, and can also be used with the Win32 API. I'll demonstrate the Microsoft C++ version of this concept.

The __declspec(thread) is a C++ extension for indicating storage class specifiers. As you can see from the the above code example, the storage class specifier is notably the thread keyword, which indicates that the storage class specifier is a thread local storage type. We then create a variable called tls_i of a integer type and assign it to the value of 1. I took the variable name from the MSDN documentation for ease on my part, and to make the variable more self-explanatory later in the code.

Since this is a global variable, and will accessible to all parts of the program, you may expect that any writes to that variable will be shown in other parts of the code. However, this is not the case with this example as a result of the thread local storage concept. The tls_i variable is incremented by 1 in the Thread_Func function, but will still be 1 within the Main thread. This can be understood better when running the code.

Now, we understand how to write Thread Local Storage variables and data, let's examine how it implemented within Windows and where we can find this information within WinDbg.

The Thread Local Storage slots can be found within the TEB data structure.


At offset 0x02c, the ThreadLocalStoragePointer field is the linear address of the thread local storage array, the address can be accessed with the use of a pointer as known in the output above. Note that the TEB is stored within the FS segment register on x86, and the GS segment register on x64. The segment registers are primarily used for performance reasons.

The TlsSlots at offset 0xe10, shows the current number of TLS slots, the minimum number of slots is 64, thus the reason for the [64].  Each slot is indexed starting 0, and is accessed with this index, this is implemented as a array with a pointer to access each slot. The TlsLinks at offset 0xf10, is a doubly linked list of the TLS memory blocks for the process.

The !tls extension can be used to view the TLS' for a particular process.


The left column indicates the Index number, and the -1 parameter is used to dump all the TLS's for the currently running process. I believe the other column could possibly be the data. This extension is unfortunately sparsely documented.

Some further advantages of using thread local storage, is that it enables multiple threads to access the same variable without any locking overhead. Essentially, two threads could share the same memory address for that particular variable, and then each write their own private data to that variable. Furthermore, many platforms and operating systems support the use of Thread Local Storage, and therefore portability shouldn't be a problem.

 The __declspec implementation only uses one large chunk of memory for a Thread Local Storage slot which is the sum of the size of each local thread variable. This enables the efficient use of indexing the slots. Internally, these variables are stored within the .tls section of the PE Header, which is part of the .rdata section of the PE Header. Please note the .tls section is actually not a requirement,  and is mostly not used by most programs.


The TLS Directory Size indicates the size of the directory, which in turn describes to the loader how the local variables are going to be used by the thread.





We can view this further with CFF Explorer, and then check the TLS Directory.


The StartAddressOfRawData and the EndAddressOfRawData indicate the beginning and ending of the TLS section. The AddressOfIndex is the address of the index used to index into the TLS array of slots. The AddressOfCallbacks is a pointer into a array of TLS Callback addresses.


Going back to the ThreadLocalStoragePointer field mentioned earlier, this is the pointer which is used to differentiate between a  variable being local to one thread and then local to another thread. The pointer is initialized with the _tls_index variable, and then given the offset of the threadedint variable so it points to local copy of the TLS type global variable created by the programmer. Note this is performed by the compiler and linker. 



References:

__declspec (C++)
Thread Local Storage
Thread Local Storage (Windows)
Win32 Thread Information Block (TIB)
Thread Specific Storage for C/C++ (Paper)
Thread Local Storage Part 1 (5 Parts) 
R4ndom's Tutorial #23: TLS Callbacks


1 comment: