Hash table data structure (aka dictionary, hash map, associate array) is a key-value pairs mapping backed by a resizeable array data structure
Attributes
Hash table uses a load factor to decide when to grow the internal array capacity
A hash function is used to generate a hash code and array index to properly distribute entries across the array buckets
Hash codes generated by the hash function may be duplicated. Separate chaining through a linked list or red-black tree data structure is the most common solution to resolve the problem
Public operations
Search, aka Get: retrieve the value by key
Add, aka Put: add a key-value entry into the hash table
Delete, aka Remove: remove an entry from the hash table by key
Internal operations
Hashing: properly distribute entries across the hash table
Resize: auto grow hash table capacity
Rehash: re-execute hashing operation for every entry in the buckets
Time complexity
- Hash table offers constant-time O(1) on average and linear-time O(n) in worst case performance for search, add and delete operations
Let's walk through this tutorial to explore them in more details
The load factor
Hash table is backed by an array and it uses the load factor to decide when to grow the array capacity
Load factor is defined as below
loadFactor = numberOfEntries / arrayCapacity
arrayCapacity, aka the number of buckets
As a general rule, loadFactor = 0.75
offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup costs and vice versa
Hashing operation
Hashing is the hash table internal operation which distributes entries across an array
Hash table uses a hash function to compute the key object to a hash code number
hashCode = hashFunction(keyObject)
Hash code is reduced to array index via modulo operator %
index = hashCode % arrayCapacity
Then the computed index is used to assign value object to the array
array[index] = valueObject
Hash collision
There's no perfect hash function. The hash codes computed by the hash function may be duplicate. However, the less duplicate hash codes, the better performance gain
There're some solutions to resolve the hash collision. Use a linked list or red-black tree data structure to save duplicate hash code entries in separate chaining is the most common way
Resize and rehash operations
Resize happens when the number of entries in the hash table is greater than the product of load factor and the array capacity
numberOfEntries > loadFactor * arrayCapacity
Hashing operation has to be executed again for every entry in the buckets right after the execution of resize operation
Resize and rehash operations are costly in term of performance, so choosing the initial capacity when creating a new Hash table is important to minimize the performance impact
Implementation example
import java.util.Arrays;
import java.util.Objects;
public class MyHashtable<K, V> {
private static final int INITIAL_CAPACITY = 16;
private static final float LOAD_FACTOR = 0.75f;
private int size = 0;
Entry<K,V>[] entries;
@SuppressWarnings({"rawtypes", "unchecked"})
public MyHashtable() {
entries = (Entry<K, V>[]) new Entry[INITIAL_CAPACITY];
}
@SuppressWarnings({"rawtypes", "unchecked"})
public MyHashtable(int capacity) {
entries = (Entry<K, V>[]) new Entry[capacity];
}
public V put(K key, V value) {
int index = hashFunction(key);
Entry<K, V> headEntry = entries[index];
Entry<K, V> currentEntry = headEntry;
// Find key in the chain
while (Objects.nonNull(currentEntry)) {
if (Objects.equals(currentEntry.key, key)) {
// Ignore duplicate key
return currentEntry.value;
}
currentEntry = currentEntry.next;
}
// Add first to the chain
Entry<K, V> newEntry = new Entry<>(key, value);
newEntry.next = headEntry;
entries[index] = newEntry;
size++;
resize();
return null;
}
public V get(K key) {
int index = hashFunction(key);
Entry<K, V> currentEntry = entries[index];
// Find key in the chain
while (Objects.nonNull(currentEntry)) {
if (Objects.equals(currentEntry.key, key)) {
return currentEntry.value;
}
currentEntry = currentEntry.next;
}
return null;
}
public V remove(K key) {
int index = hashFunction(key);
Entry<K, V> currentEntry = entries[index];
Entry<K, V> previousEntry = null;
// Find key in the chain
while (Objects.nonNull(currentEntry)) {
if (Objects.equals(currentEntry.key, key)) {
break;
}
previousEntry = currentEntry;
currentEntry = currentEntry.next;
}
if (Objects.isNull(currentEntry)) {
return null;
}
// Remove
if (Objects.isNull(previousEntry)) {
entries[index] = currentEntry.next;
} else {
previousEntry.next = currentEntry.next;
}
size--;
return currentEntry.value;
}
public int size() {
return size;
}
private void resize() {
if (size <= LOAD_FACTOR * entries.length) return;
Entry<K,V>[] currentEntries = entries;
entries = (Entry<K, V>[]) new Entry[entries.length*2] ;
rehash(currentEntries);
}
private void rehash(Entry<K,V>[] entries) {
size = 0;
for (Entry<K, V> entry : entries) {
while (Objects.nonNull(entry)) {
put(entry.key, entry.value);
entry = entry.next;
}
}
}
private int hashFunction(K key) {
int hashCode = key.hashCode();
int index = hashCode % entries.length;
return index;
}
public static class Entry<K, V> {
K key;
V value;
Entry<K, V> next;
Entry(K key, V value) {
this.key = key;
this.value = value;
}
}
public static void main(String[] args) {
MyHashtable<String, Integer> myHashtable = new MyHashtable<>(2);
myHashtable.put("k1", 1);
myHashtable.put("k2", 2);
myHashtable.put("k2", 2);
myHashtable.put("k3", 3);
Arrays.stream(myHashtable.entries)
.forEach(e -> {
if (Objects.nonNull(e)) {
System.out.printf("%s=%d%s", e.key, e.value, System.lineSeparator());
}
});
System.out.println(myHashtable.size());
System.out.println(myHashtable.get("k2"));
System.out.println(myHashtable.remove("k1"));
System.out.println(myHashtable.get("k1"));
System.out.println(myHashtable.size());
}
}
Output
k3=3
k1=1
k2=2
3
2
1
null
2
Applications
Hash table can be used to implement associative arrays, caches, database indexing and sets data structure