home page -> teaching -> data structures and algorithms -> B-trees

B-trees

Overview

B-trees are search trees optimised for being stored on a disk. Since files on a disk are read in blocks of larger size (512 bytes to 2 or 4 kbytes), and since reading is the most expensive operation, comparing the searched key against all the keys in a block is only marginally more expensive than comparing agains a single key.

B-trees consist in nodes containing at most n keys and having a number of descendents equal to the number of keys plus one.

Each sub-tree of a node contains keys between two consecutive keys in the parent (or smaller than the smallest key, or larger than the largest key of the parent).

Search

Searching for a key is done as follows:

  1. start from the root,
  2. do a binary search for the key in the current node,
  3. if the key is found, return,
  4. if the key is not found, go to the child corresponding to the interval where the searched key is between current node keys
  5. if no such child exists, return (searched key does not exist)

Balance

In a B-tree, all leaves are on the same level. This means that, if a node is not on the last level of the tree and the node has k keys, then that node has k+1 children. On the other hand, all nodes on the last level are leaves (have no children at all).

In addition, each node, except for the root, must have at least n/2 keys (that is, at least half of the node capacity).

Re-balancing after insertion

Inserting is done in a leaf.

If the leaf contained at most n-1 keys, the new key has room and no further action is needed.

If the leaf contained n keys, the following operations are done:

  1. The n+1 keys (the old n keys plus the inserted one) are put in increasing order and split n/2+1+n/2;
  2. The old node is split into two, each taking n/2 keys;
  3. The middle key is inserted in the parent node. If the parent node is already full, this leads to splitting the parent node through the same procedure. If splitting reaches the root, the root itself is split in two, and a new root, with a single key, is added above (this is the way the height increases).

Deleting keys and re-balancing

Like for binary search trees, deleting can be directly performed only on a key in a leaf node. If a key in an inner node must be deleted, it is replaced in that node by the next higher (or next lower) key (which is in a leaf node).

If the number of keys in a leaf falls below n/2, the next (or previous) key from the parent is moved into the leaf and the smallest (largest) key from the next (previous) leaf is moved in its place.

If the number of nodes in the sister leaf was also n/2, then the two leaves are joined together, with their dividing key from the parent. Then, the same algorithm is applied to the parent node, and so on, until either there are enough keys at some level, or the root itself dissapears and is replaced by its child.

Radu-Lucian LUPŞA
2016-04-04