# HelloKoding

Practical coding guides

# Hash Table Data Structure

Hash table data structure (aka dictionary, hash map, associate array) is a key-value pairs mapping backed by a resizeable array data structure

## Attributes

• Hash table uses a load factor to decide when to grow the internal array capacity
• A hash function is used to generate a hash code and array index to properly distribute entries across the array buckets
• Hash codes generated by the hash function may be duplicated. Separate chaining through a linked list or red-black tree data structure is the most common solution to resolve the problem

## Public operations

• Search, aka Get: retrieve the value by key
• Add, aka Put: add a key-value entry into the hash table
• Delete, aka Remove: remove an entry from the hash table by key

## Internal operations

• Hashing: properly distribute entries across the hash table
• Resize: auto grow hash table capacity
• Rehash: re-execute hashing operation for every entry in the buckets

## Time complexity

• Hash table offers constant-time O(1) on average and linear-time O(n) in worst case performance for search, add and delete operations

Hash table is backed by an array and it uses the load factor to decide when to grow the array capacity

Load factor is defined as below

``````loadFactor = numberOfEntries / arrayCapacity

arrayCapacity, aka the number of buckets``````

As a general rule, `loadFactor = 0.75` offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup costs and vice versa

## Hashing operation

Hashing is the hash table internal operation which distributes entries across an array

Hash table uses a hash function to compute the key object to a hash code number

``hashCode = hashFunction(keyObject)``

Hash code is reduced to array index via modulo operator %

``index = hashCode % arrayCapacity``

Then the computed index is used to assign value object to the array

``array[index] = valueObject``

## Hash collision

There’s no perfect hash function. The hash codes computed by the hash function may be duplicate. However, the less duplicate hash codes, the better performance gain

There’re some solutions to resolve the hash collision. Use a linked list or red-black tree data structure to save duplicate hash code entries in separate chaining is the most common way

## Resize and rehash operations

Resize happens when the number of entries in the hash table is greater than the product of load factor and the array capacity

``numberOfEntries > loadFactor * arrayCapacity``

Hashing operation has to be executed again for every entry in the buckets right after the execution of resize operation

Resize and rehash operations are costly in term of performance, so choosing the initial capacity when creating a new Hash table is important to minimize the performance impact

## Implementation example

MyHashtable.java

``````package com.hellokoding.datastructure;

import java.util.Arrays;
import java.util.Objects;

public class MyHashtable<K, V> {
private static final int INITIAL_CAPACITY = 16;
private static final float LOAD_FACTOR = 0.75f;
private int size = 0;

Entry<K,V>[] entries;

@SuppressWarnings({"rawtypes", "unchecked"})
public MyHashtable() {
entries = (Entry<K, V>[]) new Entry[INITIAL_CAPACITY];
}

@SuppressWarnings({"rawtypes", "unchecked"})
public MyHashtable(int capacity) {
entries = (Entry<K, V>[]) new Entry[capacity];
}

public V put(K key, V value) {
int index = hashFunction(key);

// Find key in the chain
while (Objects.nonNull(currentEntry)) {
if (Objects.equals(currentEntry.key, key)) {
// Ignore duplicate key
return currentEntry.value;
}

currentEntry = currentEntry.next;
}

// Add first to the chain
Entry<K, V> newEntry = new Entry<>(key, value);
entries[index] = newEntry;

size++;

resize();

return null;
}

public V get(K key) {
int index = hashFunction(key);
Entry<K, V> currentEntry = entries[index];

// Find key in the chain
while (Objects.nonNull(currentEntry)) {
if (Objects.equals(currentEntry.key, key)) {
return currentEntry.value;
}

currentEntry = currentEntry.next;
}

return null;
}

public V remove(K key) {
int index = hashFunction(key);
Entry<K, V> currentEntry = entries[index];
Entry<K, V> previousEntry = null;

// Find key in the chain
while (Objects.nonNull(currentEntry)) {
if (Objects.equals(currentEntry.key, key)) {
break;
}

previousEntry = currentEntry;
currentEntry = currentEntry.next;
}

if (Objects.isNull(currentEntry)) {
return null;
}

// Remove
if (Objects.isNull(previousEntry)) {
entries[index] = currentEntry.next;
} else {
previousEntry.next = currentEntry.next;
}

size--;

return currentEntry.value;
}

public int size() {
return size;
}

private void resize() {
if (size <= LOAD_FACTOR * entries.length) return;

Entry<K,V>[] currentEntries = entries;
entries = (Entry<K, V>[]) new Entry[entries.length*2] ;

rehash(currentEntries);
}

private void rehash(Entry<K,V>[] entries) {
size = 0;

for (Entry<K, V> entry : entries) {
while (Objects.nonNull(entry)) {
put(entry.key, entry.value);
entry = entry.next;
}
}
}

private int hashFunction(K key) {
int hashCode = key.hashCode();
int index = hashCode % entries.length;

return index;
}

public static class Entry<K, V> {
K key;
V value;

Entry<K, V> next;

Entry(K  key, V value) {
this.key = key;
this.value = value;
}
}

public static void main(String[] args) {
MyHashtable<String, Integer> myHashtable = new MyHashtable<>(2);
myHashtable.put("k1", 1);
myHashtable.put("k2", 2);
myHashtable.put("k2", 2);
myHashtable.put("k3", 3);

Arrays.stream(myHashtable.entries)
.forEach(e -> {
if (Objects.nonNull(e)) {
System.out.printf("%s=%d%s", e.key, e.value, System.lineSeparator());
}
});

System.out.println(myHashtable.size());

System.out.println(myHashtable.get("k2"));

System.out.println(myHashtable.remove("k1"));

System.out.println(myHashtable.get("k1"));

System.out.println(myHashtable.size());
}
}
``````

Output

``````k3=3
k1=1
k2=2
3
2
1
null
2``````

## Applications

Hash table can be used to implement associative arrays, caches, database indexing and sets data structure