Saturday 7 June 2014

Data Structures - Red and Black Trees

Graph Theory Related Concepts:

Before explaining the concept of Red and Black Trees and some of the algorithm analysis related to Red and Black Trees. It's best to have a understanding of some basic Graph Theory, such as what is a tree and the colouring nodes (or vertices). Graph Theory is my favorite area of Mathematics, and has many applications in Computer Science and other subjects. Graph Theory is a very visual form of Mathematics, much like Geometry, and for those who enjoy those types of Mathematics may also enjoy Graph Theory.

A graph G is a ordered pair (V,E), with V being the graph's set of vertices and E being the the graph's set of edges which connect each vertex. Most graphs will have this basic structure, and it's the connections within the vertices which give graph their interesting properties. Hypergraphs are very interesting type of graph in my opinion, and sit between the boundary of Set Theory and Graph Theory. I may talk about more in the future.


Hypergraph
I'm going to skip most of the introductory topics of Graph Theory, which doesn't really apply to within this context.  A tree is a connected undirected graph (no arrows with the edges) with no simple cycles.

Tree T(9,8)
 Vertices within Trees are called Nodes, and each node has been labelled with a corresponding integer. We could label these nodes with any labels we so wish, such as locations within a Computer Network or Cities on a map (some applications of Graph Theory). A Red and Black Tree has the same structure, however, the nodes are coloured with Red or Black, hence the name for a Red and Black Trees. Before we continue, it's best to understand the terms Simple, Cycle and Connected. Graph Theory is very intuitive in terms of definitions and so it should be very easy to understand.

 A graph is simple if it contains no multiple edges between any two adjacent vertices, or loops (edges which loop onto the same vertex). A graph contains a cycle if there exists a path between at least two vertices within the graph, and the starting and ending vertex are the same. A cycle has no repeated edges or loops. A graph is connected if there exists a path between any two vertices. There is much more to the definitions which I have just given, however, the definitions are suffice for the topic which I'm going to explore.

There is one last Graph Theoretic definition which we must understand before we should learn about Red and Black Trees, and that is the concept of Chromatic Graph Theory, which looks at the idea of giving vertices, edges and faces within a graph distinct colours or groups of colours with certain rules to other vertices and edges within the same graph. A very famous problem was the Four-Colour Theorm.



With Red-Black Trees, the graphs use Vertex Colouring, which is as the name suggests, the colouring of vertices with a certain colour. If no two vertices connected by the same edge i.e any two vertices which are adjacent; do not have the same vertex colour, then the colouring within the graph is proper vertex colouring, which is simply called colouring. However, please bear in mind, that in this context, colouring simply refers to giving each node within the tree a colour: Red or Black. The Chromatic Polynomial can be used to calculate the possible number of proper vertex colourings for a graph.

Now, we understand the basic concepts for the theory of Red-Black Trees, which can examine the properties and algorithm mechanism of this data structure.

Red-Black Trees

There are several key properties to remember when examining Red-Black Tree data structures, since these properties will constitute which nodes are rotated when new data is inserted into the tree.

Properties: 
  1. The root is black. This is the uppermost node. The rest of the nodes are considered Children of the root.
  2. Every red node must have two black child nodes.
  3. A node is either red or black.
  4. All leaves (NIL) are black. A leaf is a vertex or node which has the degree of 1 (one edge connected to the vertex).
  5. Every simple path from any given node through it's subsequent leaves contains the same number of black nodes.
 The tree will be rotated and nodes will be changed to ensure that these properties are maintained. The number of black nodes on the path from the root to a leaf is known as the black height of the tree.

Insertion:

For the insertion of new data into the tree, the complexity of this operation is O(log n) for both Average and Worst Case situations. To insert new data into the tree, we must check for a insertion point, which will one of the NIL nodes. Once the insertion point has been found (sentential node), then the new node will be coloured red and the properties of the tree will be checked. The parent node of the newly inserted node will be checked, and recoloured if needed. Rotations on the tree may be performed to validate the other properties of the tree.

Deletion:

The complexity of deletion is again O(log n) for both cases. The deletion of nodes follows a similar process as before.

Space:

The data structure uses very little memory, and has a space complexity of O(n) in both Average and Worst Case.

References:

Sorting and Searching Algorithms - The Cookbook
Red-Black Tree Wikipedia
Graph Theory (Graduate Texts in Mathematics) - Reinhard Diestel 



No comments:

Post a Comment