home page -> teaching -> data structures and algorithms -> hash tables

Hash tables

Overview

TBD

Direct address tables

TBD

Hash tables

TBD

Hash functions

TBD

Collision resolution by chaining

TBD

Hash tables in various programming languages

TBD

Open addressing

Idea

When a collision occurs on inserting an element, we attempt to put the element in the next cell (at the next position in the table), or, more generally, in another pre-determined cell.

In the most general case, if a key k is to be inserted, the following sequence of indexes is tried, until the first empty cell is found: g(k,0), g(k, 1), g(k, 2), etc.

On lookup, the same sequence is tried, until either:

Choices for the cell sequence

Performance

Assuming that the hash distributes uniformly the input keys on the cells, the probability that cell g(k, i) is occupied is the same for all keys and all positions in the probing sequence, and it is equal to the load factor α.

Now, the probability to have, for a fixed key to be inserted, at least i cells probed is αi-1 (it is equal to the probability that the first i-1 cells are occupied. The average number of cells to be probed is then α0 + α1 + α2 + ... = 1/(1-α).

Issues on removing keys

Suppose a key k was inserted, not in the first probed cell - which was occupied - but in the second cell. Then, suppose the key occupying the first cell above is deleled. If we just set the cell as empty, then a subsequent attempt to search for the key k will fail, because the first probed cell is now empty, and the second cell is never probed.

So, when a key is removed from the hash table, the cell must have a special marking, showing that a key existed there but was removed. Such a cell can be used for inserting a new element in its place, but must be treated as an occupied cell during the search for a key.

Radu-Lucian LUPŞA
2016-05-10