home page -> teaching -> data structures and algorithms -> Binary search trees

Binary search trees

Short overview

Sorted vectors allow search in O(log n), but insert is O(n) (because of moving elements after the insert position). Linked lists allow insert in O(1) at any position, but lookup takes O(n) because there is no way to go to an element other than by going sequentially from the start.

Binary search trees allow going along the binary search directions.

So, a binary search tree contains keys in all nodes and has property that, for each node, the left subtree contains only keys smaller than the key in the parent node, and the right subtree contains only keys larger than the key in the parent node.

Balanced trees

Goal: to avoid trees to degenerate to linked lists. In other words, to keep the height of the tree in O(n), where n is the number of nodes.

AVL trees

An AVL tree is a binary search tree where, for each node, the difference between the heights of the left and right of sub-trees is at most one.

Minimum number of nodes for a given height - denote it by N(h).

We have:

We prove by induction that N(h) ≥ ᵩh. For 0 and 1, this is immediate: N(0)=1= ᵩ0 and N(1)=2>ᵩ1≃1.6

N(h+1)=1+N(h)+N(h-1)>ᵩh+ᵩh-1= ᵩh(1+1/ᵩ) = ᵩhᵩ = ᵩh+1

Insertions and re-balancing in AVL trees

TBD

Red-black trees

A red-black tree has each node labeled "red" or black". Important: the tree is considered as having "null" nodes, without keys, as leaves; in practice, those nodes are not actually represented.

Restrictions:

Insertions and re-balancing in red-black trees

TBD

Deleting nodes

There are 3 cases:

  1. The node to be deleted has no children. This is easy - no "orphan" child to be re-linked into the tree.
  2. The node has only one child. The child is then linked to its grand-parent (the parent of the deleted node) in place of the deleted node.
  3. The node has two children. This is the most complex case. The node that immediately preceeds or immediately follows the deleted node must take the place of the deleted node, and adopt its children. This node, that takes the place of the deleted node, can have at most 1 child and is handled as in case 1 or 2 above.

Re-balancing AVL trees

Re-balancing red-black trees

Parsing the nodes

This section deals with how an iterator is represented. Possibilities:

  1. The iterator holds the entire path from the root to the current node.
  2. The iterator is a single pointer to a node, and each node holds a pointer to its parent (in addition to the two children).
  3. The iterator is a pointer to a node, and nodes that have no left or right child hold instead pointers to their predecessors, respectively successors, in the sorting order.

The first approach makes the iterator a full-blown collection (vector or linked list). This is undesirable because it makes parsing slow.

The second approach increases the size of a node by adding the pointer to parent. Parsing is done as follows:

Input-output: p - pointer to the current node; at exit, it will point to the next node
Algorithm:
    if p->right != null:
        p = p->right
        while p->left != null:
            p = p->left
    else:
        while p->parent != null and p == p->parent->right:
            p = p->parent
        if p->parent != null:
            p = p->parent
        else:
            p = &end

The third approach requires only two bits per node in order to mark if the left and right links are to children or to previous/next node. Those links have to be maintained on insertions and deletions of nodes. Parsing is done as follows:

Input-output: p - pointer to the current node; at exit, it will point to the next node
Algorithm:
    if p->hasRight:
        p = p->right
        while p->hasLeft:
            p = p->left
    else:
        p = p->right
Radu-Lucian LUPŞA
2016-03-29