CodingBison - Binary Search Trees

A binary search tree is a popular data structure that is well-optimized for various tasks like lookup, insertion, deletion, and traversal. However, unlike hash-tables, where the lookup can be typically of the order of O(1), a binary search tree provides a best lookup of O(logN). O(logN) is still a great lookup time, if we consider the overall purpose for which a binary search tree is designed. A binary search tree guarantees that both the insertion/deletion time and the lookup time would be of the order of O(logN); of course, the assumption is that the binary search tree is fairly balanced.

A binary search tree stores each input as a node. Each node in the tree (excluding the leaf nodes) can have a maximum of two child nodes that are represented by, let us say, "left" and "right" pointers. A binary search tree also needs to have a pointer to its parent -- this is needed for operations where we need to navigate upwards in the tree. In the following figure, for node 20, the "parent" pointer points to node 40, the "left" pointer points to 15, and the "right" pointer points to node 23.

                      40                                  40
                     /  \                                 | 
                    /    \                                | (parent pointer)
                   20    44                              20
                  /  \     \             (left pointer) /  \ (right pointer)
                 /    \     \                          /    \
                15   23     48                        15    23
                     /      / \    
                    /      /   \ 
                   21    46    100

                             Figure: Binary Search Tree

Binary search tree is also sorted -- the right node on the tree is always greater than its parent node and the left node. Hence, a left node is less than its parent node and parent's right node. For example, in the above Figure, nodes sitting on the left sub-tree of the root node (with value 40) are all less than 40. Likewise, nodes sitting on the right sub-tree of the root node are all greater than 40.

Basic Operations

Search

Compared to a linked list, searching becomes faster with a binary search tree since it is by definition sorted. This is because at any point, we have the knowledge that the nodes present on the right side have a higher key than those of nodes present on the left. And so, at each node, we avoid the other potentially "half" of the elements that sit on the other sub-tree. In other words, if we find that the value that we are looking for is greater than the root, then we focus our search on the right sub-tree and avoid all the elements sitting in the left sub-tree. Elimination of half of the nodes at each step reduces the complexity to O(logN) as compared to O(N) complexity for a linear linked list. Put differently, searching for a binary search tree, instead of traversing the entire tree, we only traverse the depth of the tree and that is of the order of logN.

We demonstrate this using two examples in the figure below. In the first example, we search for node 47; since 47 is not present in the tree, we end up traversing the entire depth before we conclude that 47 is not present. The search path taken is: [40,44,48,46]; in the figure, we show the search path taken using double lines. In the second example, we search for a node with value 23. Since 23 is present in the tree and its depth is only two, we are able to locate the node in 3 searches. The search path taken is: [40,20,23].

                          40                                 40               
                         /  \\                             //  \               
                        /    \\                           //    \             
                       20     44                         20      44          
                      / \      \\                       / \\      \         
                     /   \      \\                     /   \\      \        
                    15   23      48                   15    23      48     
                         /      // \                       / \      / \   
                        /      //   \                     /   \    /   \
                       21     46   100                   21   28  46   100

                 Searching 47: Not Found           Searching 23: Found 

                             Figure: Searching a Binary Search Tree

Insertion

Inserting into this tree is a matter of finding the location where one can insert the current value -- the goal being that the new location should still respect the binary search tree logic -- nodes on the right should be greater than the nodes on the left. Please note that once we insert a new element in the tree, we make its parent pointer point to the parent node for the current location. In the following figure, we show path for insertion of a node with value 28. We show the path for locating the location using double lines. We start at node 40, since 28 is less than 40, we go to the left value of 40, which is 20. Since 28 is greater than 20, we go to the right value of 20, which is 23. Since 28 is still greater than 23, we go the right of 23. But, right value of 23 is NULL, so, we insert 28 as the right child of node with value 23 and make 23 the parent of 28.

                          40                            40               
                         /  \                         //   \               
                        /    \      Inserting 28     //     \             
                       20     44     --------->     20      44          
                      / \      \                   / \\       \         
                     /   \      \                 /   \\       \        
                    15   23      48              15    23       48     
                         /      / \                   / \\     / \   
                        /      /   \                 /   \\   /   \
                       21     46   100              21    28 46   100

                             Figure: Binary Search Tree Insertion

Deletion

Deleting a node from the binary search tree is slightly involved. There are three cases that one needs to consider for deletion. In the first case, we are deleting a node that does not have any child node -- this is the simplest case and we simply need to make current pointer from the parent node to the node being deleted (left or right) NULL. In the second case, we are deleting a node that has one child node -- in this case, we essentially have to splice the node out and make the parent of its child node, its own parent node. In the third case, we have a node that has both children. For such cases, we need to simply replace this node with its successor node.

A successor is the node that has the smallest value which is larger than the value of the given node. If a node has right sub-tree, then we find the smallest element in the right-subtree and that node becomes the successor of the node. If the node has no right sub-tree, then we travel upstream towards the root and find the smallest node that has a value higher than that of the current node. For example, for 101, finding a successor means finding the smallest element in its right sub-tree; this successor node would be 120. For node 97, there is no right sub-tree, so finding its successor means traveling upwards towards the root. 98 is the first node which has a value greater than that of 97; hence, 98 is the successor of node 97. Lastly, since 200 is the largest value, there would not be any successor to the node with value 200!

As we will see in the next function, we need to locate a successor node when deleting an element from the tree. Since we are going to use successor to handle delete when a node has both right an left sub-trees, our implementation retrieves successor node from the right subtree, which would be the minimum node on the right sub-tree. Of course, as an alternative, one can also select the maximum node on the left sub-tree too.

                                              101 
                                             /  \
                                            /    \ 
                                           98    120
                                          /        \
                                         /          \
                                        88          140
                                       / \            \
                                      /   \            \ 
                                     77   97          200 

                             Figure: Finding Successor in a Binary Search Tree

Now that we understand what successor nodes are, let us understand the three deletion cases using the following figure. In the first case, we delete a leaf node; a leaf node is a node that has no left or right child node. This case is simple as we simply remove the node from the tree and make its parent left or right fields (whichever holds) to NULL. In the figure below, deletion of node with value 200 falls in this case. In the second case, we delete a node that has only one child. In this case, we delete the node and make the parent's left or right fields (whichever holds) point to the child node of the current node. In the figure below, deletion of node with value 120 falls in this case. In the third case, we delete a leaf node that has both left and right children. In this case, we find the successor for the node, which would typically lie in the right sub-tree, delete the successor and place its value in the current node. Thus, one of the utilities of finding a successor node is when we try to delete a node that has both left and right nodes. In the figure below, deletion of node with value 88 falls in this case.

                  101                                101 
                 /  \                                /  \
                /    \        Deleting 200          /    \ 
               98    120       ---------->        98     120
              /        \                          /        \
             /          \                        /          \
            88          140                     88          140
           / \            \                    / \            
          /   \            \                  /   \           
         77   97          200                77   97         

                 Case 1: Deleting a leaf node (value 200) 

                  101                                101 
                 /  \                                /  \
                /    \        Deleting 120          /    \ 
               98    120       ---------->         98    140
              /        \                          /        \
             /          \                        /          \
            88          140                     88          200
           / \            \                    / \            
          /   \            \                  /   \           
         77   97          200                77   97         

                 Case 2: Deleting a node with only one child (value 120) 

                  101                                101 
                 /  \                                /  \
                /    \        Deleting 88           /    \ 
               98    120       ---------->         98    120
              /        \                          /        \
             /          \                        /          \
            88          140                     97          140
           / \            \                    /              \              
          /   \            \                  /                \           
         77   97          200                77                200         

                 Case 3: Deleting a node with both children (value 88) 

                             Figure: Deletion in a Binary Search Tree

Traversal

For a binary tree, traversal can be done in three ways: (a) in-order, (b) pre-order, and (c) post-order. In the "in-order" style, we start with the left node first, parent node second, and the right node last. In the "pre-order" style, we start with the parent node first, left node second, and the right node last. In the "post-order" style, we begin with the left node first, the right node second, and the parent node last. Thus, all of these styles are with-respect-to the parent. In the "in-order", the parent sits in the middle, in "pre-order", the parent sits on the left, and in "post-order", the parent sits on the right. Since we go through each node of the tree, the traversal cost O(N) in terms of time-complexity.

Implementation

Let us look at the simple version of Binary Search Tree. In the implementation of a binary search tree, the first data-structure is the "bst_node_t" and this represents a node in the binary search tree. This data structure contains pointers for left node, right node, and value. The key member of this structure is the "value" field that holds the data for a tree node; In this implementation, the "value" field holds an integer.

Before we begin, these are the functions defined in the implementation of binary search tree. The function binary_search_tree_insert() adds a node to the tree. The function binary_search_tree_search() searches for a node in the tree. Next, binary_search_tree_delete() finds the node that has the passed value and deletes it from the tree. Lastly, binary_search_tree_print() traverses (in-order traversal) and prints values of each node.

We bring all of the above APIs together by triggering them using a main() function. The main function calls binary search tree APIs that are defined already in the main_bst.c and uses them for insertion, deletion, search and displaying the elements of the binary search tree. It uses an array to get the list of elements that it needs, to add to the binary search tree.

let us call the below file as main_bst.c

 #include <stdio.h>
 #include <stdlib.h>
 #include <stdbool.h>

 typedef struct bst_node_ {
     int data;
     struct bst_node_ *left;
     struct bst_node_ *right;
 } bst_node_t;

 /* Root of the binary search tree */
 bst_node_t *bst_root = NULL;
 /*
  * Search for the element in a binary search tree. 
  * Return TRUE if found else return FALSE
  */
 bool binary_search_tree_search (bst_node_t *parent, int data) {
     bst_node_t *temp = parent;
     if (temp == NULL) {
         return false; 
     } else {
         if (data  == temp->data) {
             return true;
         } else if ( data < temp->data) {
             return binary_search_tree_search(temp->left, data);
         } else {
             return binary_search_tree_search(temp->right, data);
         }
     }
 }

 /* Insert the node in a binary search tree */
 void binary_search_tree_insert (bst_node_t **parent, int data) {
     bst_node_t *temp = *parent;

     if (temp == NULL) {
         temp = (bst_node_t *) malloc(sizeof(bst_node_t));
         temp->data = data;
         temp->left = temp->right = NULL;
         *parent = temp;
         return;
     } else {
         if (data < temp->data) {
             binary_search_tree_insert(&(temp->left), data);
         } else {
             binary_search_tree_insert(&(temp->right), data);
         }
     }
 }

 /* In-order traversal of a binary search tree */
 void binary_search_tree_print (bst_node_t *node, bst_node_t *parent) {
     if (node == NULL) {
         return;
     }

     binary_search_tree_print(node->left, node);
     if (parent) {
         printf("%d(p: %d) ", node->data, parent->data);
     } else {
         printf(" %d(Root) ", node->data);
     }
     binary_search_tree_print(node->right, node);
 }

 /* Returns the min node on the right sub array */
 bst_node_t* binary_search_tree_successor_on_right (bst_node_t *node) {
     while (node->left != NULL) {
         node = node->left;
     }
     return node;
 }

 /* Delete a node from a binary search tree */
 bst_node_t* binary_search_tree_delete (bst_node_t *parent, int data) {
     bst_node_t *temp_root_swap = NULL, *right_min = NULL;

     if (parent == NULL) {
         printf("Parent node is NULL, tree is empty\n");
         return parent;
     } else {
         if (parent->data < data) {
             parent->right = binary_search_tree_delete(parent->right, data);
         } else if (parent->data > data) {
             parent->left = binary_search_tree_delete(parent->left, data);
         } else {
             /* When a node has both the children */
             if (parent->right != NULL && parent->left != NULL) {
                 right_min = binary_search_tree_successor_on_right(parent->right);
                 parent->data  = right_min->data;
                 parent->right = binary_search_tree_delete(right_min, right_min->data);
             } else {
                 temp_root_swap = parent;
                 if (parent->left == NULL) {
                     parent = parent->right;
                 } else if (parent->right == NULL) {
                     parent = parent->left;
                 }
                 free(temp_root_swap);
                 temp_root_swap = NULL;
             }
         }
     }
     return parent;
 }

 int main (int argc, char *argv[]) {
     bst_node_t *search_node = NULL;
     int input_array[] = {680, 101, 237, 980, 880};
     int len = sizeof(input_array)/sizeof(input_array[0]);
     int i;

     bool status;
     for (i = 0; i < len; i++) {
         binary_search_tree_insert(&bst_root, input_array[i]);
     }

     /* Traverse and print the elements in the tree */
     binary_search_tree_print(bst_root, NULL);
     printf("\n");

     status = binary_search_tree_search(bst_root, 101);
     if (status == false)
         printf("Node '101' not found in the tree \n");
     else {
         printf("Node '101' found in the tree \n");
     }

     printf("Delete the node with value '101' \n");
     (void) binary_search_tree_delete(bst_root, 101);

     status = binary_search_tree_search(bst_root, 101);
     if (status == false)
         printf("Node '101' not found in the tree \n");
     else {
         printf("Node '101' found in the tree \n");
     }

     binary_search_tree_print(bst_root, NULL);
     printf("\n");
     return 0;
 }

Insertion, searching, traversal APIs are simple and very much self explanatory. Deletion of a node from a binary search tree is little tricky. As explained earlier, we need to handle 3 cases: (a) deletion of node with both left and right child, (b) deletion of node with one child, and (c) deletion of node with no children. Please note that the binary_search_tree_print() function prints each node and for each node, it also prints its parent. The root node would have no parent and so the print routine simply appends the "Root" text to the root node.

Here is the output:

 user@codingbison $ gcc main_bst.c -o main_bst
 user@codingbison $ ./main_bst 
 101(p: 680) 237(p: 101)  680(Root) 880(p: 980) 980(p: 680) 
 Node '101' found in the tree 
 Delete the node with value '101' 
 Node '101' not found in the tree 
 237(p: 680)  680(Root) 880(p: 980) 980(p: 680) 
 user@codingbison $

In the above output, we attempt to insert elements {680, 101, 237, 980, 880} and then prints the binary search tree. From the output, node with value 680 sits at the root of the tree (its parent is NULL). All nodes are (direct/indirect) child nodes of this node.

As good as binary search tree is, it suffer from one major drawback -- it can get unbalanced. If we start with a root that is an "average" value among the data set, then as elements get added, they can continue to sit on the either side of the node. However, if we were to start with the extreme case of elements that are already sorted, then this might pose a problem. In this case, we can end with all elements added along the same line (path), much like a linked list! If we were to do a search for the last element (the largest, let us say) in this unbalanced tree, we would have to search O(N) times instead of the expected O(logN) times. It is for this reason that we need AVL trees, where the tree is periodically balanced (when needed) to ensure that we stay close to O(logN) value instead of O(N).

Advanced Implementation

The earlier implementation is a good start for understanding binary search trees. However, the implementation is not very generic and as a result, we may not be able to use it in a production software, the way it is. Accordingly, we modify the above implementation to make it more generic so that it can be used as a common implementation for various applications.

To make things generic, one thing would be to split the above implementation into three files: (a) a header file that should publish APIs, (b) an implementation file for the APIs, and (c) another C file that contains the main() routine that would use the binary search tree APIs. Another thing is that the value member in the "bst_node_t" data-structure holds an integer. Next, to make things generic, we can use a "void *" type so that we can store any type of data. Lastly, we should probably also add another structure that can encapsulate the binary search tree root.

If you are curious, the complete program for an advanced implementation is available: Advanced Implementation of Binary Search Tree.