Table of Contents
Introduction
A hash table in C/C++ is a data structure that maps keys to values. A hash table uses a _hash function_ to compute indexes for a key. You can store the value at the appropriate location based on the hash table index.
<!–
–>
The benefit of using a hash table is its very fast access time. Typically, the time complexity (amortized time complexity) is a constant O(1) access time.
If two different keys get the same index, you will need to use other data structures (buckets) to account for these collisions. If you choose a very good hash function, the likelihood of a collision can be negligible.
The C++ STL (Standard Template Library) has the std::unordered_map() data structure.
In this article, you will construct a hash table from scratch comprised of:
- A hash function to map keys to values.
- A hash table data structure that supports
insert,search, anddeleteoperations. - A data structure to account for a collision of keys.
Choosing a Hash Function
The first step is to choose a reasonably good hash function that has a low chance of collision. However, for the purposes of this tutorial, a poor hash function will be applied to better illustrate hash collisions. This limited example will also only utilize strings (or character arrays in C).
[label HashTable.cpp]
#define CAPACITY 50000 // Size of the HashTable.
unsigned long hash_function(char* str)
{
unsigned long i = 0;
for (int j = 0; str[j]; j++)
i += str[j];
return i % CAPACITY;
}
Run this code and test different strings for potential collisions. For example, the strings Hel and Cau will collide since they have the same ASCII value.
This code must return a number within the bounds of the capacity of the hash table. Otherwise, it may access an unbound memory location, leading to an error.
Defining the Hash Table Data Structures
A hash table is an array of items, which are { key: value } pairs.
First, define the item structure:
[label HashTable.cpp]
// Defines the HashTable item.
typedef struct Ht_item
{
char* key;
char* value;
} Ht_item;
Now, the hash table has an array of pointers that point to Ht_item, so it is a _double-pointer_.
[label HashTable.cpp]
// Defines the HashTable.
typedef struct HashTable
{
// Contains an array of pointers to items.
Ht_item** items;
int size;
int count;
} HashTable;
Your hash table will need to return the number of elements in the hash table using count and size of the hash table using size.
Creating the Hash Table and Hash Table Items
Next, create functions for allocating memory and creating items.
Create items by allocating memory for a key and value, and return a pointer to the item:
[label HashTable.cpp]
Ht_item* create_item(char* key, char* value)
{
// Creates a pointer to a new HashTable item.
Ht_item* item = (Ht_item*) malloc(sizeof(Ht_item));
item->key = (char*) malloc(strlen(key) + 1);
item->value = (char*) malloc(strlen(value) + 1);
strcpy(item->key, key);
strcpy(item->value, value);
return item;
}
Create the table by allocating memory and setting size, count, and items:
[label HashTable.cpp]
HashTable* create_table(int size)
{
// Creates a new HashTable.
HashTable* table = (HashTable*) malloc(sizeof(HashTable));
table->size = size;
table->count = 0;
table->items = (Ht_item**) calloc(table->size, sizeof(Ht_item*));
for (int i = 0; i < table->size; i++)
table->items[i] = NULL;
return table;
}
The preceding example allocates memory for the wrapper structure HashTable and sets all the items to NULL.
Freeing up memory is a C/C++ best practice. Free up memory that you've allocated on the heap with malloc() and calloc().
Write functions that free up a table item and the whole table.
[label HashTable.cpp]
void free_item(Ht_item* item)
{
// Frees an item.
free(item->key);
free(item->value);
free(item);
}
void free_table(HashTable* table)
{
// Frees the table.
for (int i = 0; i < table->size; i++)
{
Ht_item* item = table->items[i];
if (item != NULL)
free_item(item);
}
free(table->items);
free(table);
}
Add a print_table() to display the index, key, and value for each item:
[label HashTable.cpp]
void print_table(HashTable* table)
{
printf("\nHash Table\n-------------------\n");
for (int i = 0; i < table->size; i++)
{
if (table->items[i])
{
printf("Index:%d, Key:%s, Value:%s\n", i, table->items[i] -> key, table->items[i]->value);
}
}
printf("-------------------\n\n");
}
This concludes the basic functionality of your custom hash table. You will now write insert, search, and delete functions.
Inserting into the Hash Table
Create a function, ht_insert(), that performs insertions.
The function takes a HashTable pointer, a key, and a value as parameters:
void ht_insert(HashTable* table, char* key, char* value)
{
...
}
Now, there are certain steps involved in the ht_insert() function.
- Create the item based on the
{ key: value }pair. - Compute the index based on the hash function.
- Check if the index is already occupied or not, by comparing the
key. - If it is not occupied, you can directly insert it into
index. - Otherwise, it is a collision, and you will need to handle it.
This tutorial will address handling collisions after the initial model has been created.
First, create the item:
create_item(key, value)
Then, compute the index:
int index = hash_function(key);
When inserting the key for the first time, the item must be a NULL:
[label HashTable.cpp]
// Creates the item.
Ht_item* item = create_item(key, value);
// Computes the index.
int index = hash_function(key);
Ht_item* current_item = table->items[index];
if (current_item == NULL)
{
// Key does not exist.
if (table->count == table->size)
{
// HashTable is full.
printf("Insert Error: Hash Table is full\n");
free_item(item);
return;
}
// Insert directly.
table->items[index] = item;
table->count++;
}
Consider the scenario where the { key: value } pair already exists because the same item has been inserted into the hash table. To address this, the code must update the item value to the new one:
[label HashTable.cpp]
if (current_item == NULL)
{
...
}
<^>else {<^>
// Scenario 1: Update the value.
<^>if (strcmp(current_item->key, key) == 0)<^>
<^>{<^>
<^>strcpy(table->items[index] -> value, value);<^>
<^>return;<^>
<^>}<^>
<^>}<^>
Consider the scenario where a collision has to be handled. To address this, a placeholder has been added:
[label HashTable.cpp]
<^>void handle_collision(HashTable* table, Ht_item* item)<^>
<^>{<^>
<^>}<^>
void ht_insert(HashTable* table, char* key, char* value)
{
...
if (current_item == NULL)
{
...
}
else {
// Scenario 1: Update the value.
if (strcmp(current_item->key, key) == 0)
{
...
}
<^>else {<^>
// Scenario 2: Handle the collision.
<^>handle_collision(table, item);<^>
<^>return;<^>
<^>}<^>
}
}
Now, your ht_insert() function is complete.
Searching for Items in the Hash Table
Create a function, ht_search(), that checks if the key exists, and returns the corresponding value if it does.
The function takes a HashTable pointer and a key as parameters:
char* ht_search(HashTable* table, char* key)
{
...
}
Search for an item with the key in the HashTable. If the item cannot be found in the HashTable, NULL is returned.
[label HashTable.cpp]
char* ht_search(HashTable* table, char* key)
{
// Searches for the key in the HashTable.
// Returns NULL if it doesn't exist.
int index = hash_function(key);
Ht_item* item = table->items[index];
// Provide only non-NULL values.
if (item != NULL)
{
if (strcmp(item->key, key) == 0)
return item->value;
}
return NULL;
}
Add a print_search() to display the item that matches the key:
[label HashTable.cpp]
void print_search(HashTable* table, char* key)
{
char* val;
if ((val = ht_search(table, key)) == NULL)
{
printf("Key:%s does not exist\n", key);
return;
}
else {
printf("Key:%s, Value:%s\n", key, val);
}
}
Now, your ht_search() function is complete.
Handling Collisions
There are different ways to resolve a collision. This tutorial will rely upon a method called Separate Chaining, which aims to create independent chains for all items that have the same hash index. The implementation in this tutorial will create these chains using linked lists.
Whenever there is a collision, additional items that collide on the same index are added to an _overflow bucket list_. Thus, you will not have to delete any existing records on the hash table.
Due to linked lists having O(n) time complexity for insertion, searching, and deletion, in case of a collision, you will have a worst-case access time of O(n) as well. The advantage of this method is that it is a good choice if your hash table has a low capacity.
<!–
–>
Implement the overflow bucket list:
[label HashTable.cpp]
// Defines the LinkedList.
typedef struct LinkedList {
Ht_item* item;
struct LinkedList* next;
} LinkedList;;
LinkedList* allocate_list()
{
// Allocates memory for a LinkedList pointer.
LinkedList* list = (LinkedList*) malloc(sizeof(LinkedList));
return list;
}
LinkedList* linkedlist_insert(LinkedList* list, Ht_item* item)
{
// Inserts the item onto the LinkedList.
if (!list)
{
LinkedList* head = allocate_list();
head->item = item;
head->next = NULL;
list = head;
return list;
}
else if (list->next == NULL)
{
LinkedList* node = allocate_list();
node->item = item;
node->next = NULL;
list->next = node;
return list;
}
LinkedList* temp = list;
while (temp->next->next)
{
temp = temp->next;
}
LinkedList* node = allocate_list();
node->item = item;
node->next = NULL;
temp->next = node;
return list;
}
Ht_item* linkedlist_remove(LinkedList* list)
{
// Removes the head from the LinkedList.
// Returns the item of the popped element.
if (!list)
return NULL;
if (!list->next)
return NULL;
LinkedList* node = list->next;
LinkedList* temp = list;
temp->next = NULL;
list = node;
Ht_item* it = NULL;
memcpy(temp->item, it, sizeof(Ht_item));
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
return it;
}
void free_linkedlist(LinkedList* list)
{
LinkedList* temp = list;
while (list)
{
temp = list;
list = list->next;
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
}
}
Now, add these overflow bucket lists to your HashTable. Every item will have a chain, so the whole table is an array of LinkedList pointers.
[label HashTable.cpp]
typedef struct HashTable HashTable;
// Defines the HashTable.
struct HashTable
{
// Contains an array of pointers to items.
Ht_item** items;
<^>LinkedList** overflow_buckets;<^>
int size;
int count;
};
Now that overflow_buckets have been defined, add functions to create and delete them. You will also need to account for them in the create_table() and free_table() functions.
[label HashTable.cpp]
<^>LinkedList** create_overflow_buckets(HashTable* table)<^>
<^>{<^>
// Create the overflow buckets; an array of LinkedLists.
<^>LinkedList** buckets = (LinkedList**) calloc(table->size, sizeof(LinkedList*));<^>
<^>for (int i = 0; i < table->size; i++)<^>
<^>buckets[i] = NULL;<^>
<^>return buckets;<^>
<^>}<^>
<^>void free_overflow_buckets(HashTable* table)<^>
<^>{<^>
// Free all the overflow bucket lists.
<^>LinkedList** buckets = table->overflow_buckets;<^>
<^>for (int i = 0; i < table->size; i++)<^>
<^>free_linkedlist(buckets[i]);<^>
<^>free(buckets);<^>
<^>}<^>
HashTable* create_table(int size)
{
// Creates a new HashTable.
HashTable* table = (HashTable*) malloc(sizeof(HashTable));
table->size = size;
table->count = 0;
table->items = (Ht_item**) calloc(table->size, sizeof(Ht_item*));
for (int i = 0; i < table->size; i++)
table->items[i] = NULL;
<^>table->overflow_buckets = create_overflow_buckets(table);<^>
return table;
}
void free_table(HashTable* table)
{
// Frees the table.
for (int i = 0; i < table->size; i++)
{
Ht_item* item = table->items[i];
if (item != NULL)
free_item(item);
}
// Free the overflow bucket lists and its items.
<^>free_overflow_buckets(table);<^>
free(table->items);
free(table);
}
If the overflow bucket list for the item does not exist, create a list and add the item to it.
Update handle_collision() for insertions:
[label HashTable.cpp]
void handle_collision(HashTable* table, <^>unsigned long index,<^> Ht_item* item)
{
<^>LinkedList* head = table->overflow_buckets[index];<^>
<^>if (head == NULL)<^>
<^>{<^>
// Creates the list.
<^>head = allocate_list();<^>
<^>head->item = item;<^>
<^>table->overflow_buckets[index] = head;<^>
<^>return;<^>
<^>}<^>
<^>else {<^>
// Insert to the list.
<^>table->overflow_buckets[index] = linkedlist_insert(head, item);<^>
<^>return;<^>
<^>}<^>
}
And the call:
[label HashTable.cpp]
void ht_insert(HashTable* table, char* key, char* value)
{
...
if (current_item == NULL)
{
...
}
else {
// Scenario 1: Update the value.
if (strcmp(current_item->key, key) == 0)
{
...
}
else {
// Scenario 2: Handle the collision.
handle_collision(table, <^>index,<^> item);
return;
}
}
}
Now, update the search method to use overflow buckets:
[label HashTable.cpp]
char* ht_search(HashTable* table, char* key)
{
// Searches for the key in the HashTable.
// Returns NULL if it doesn't exist.
int index = hash_function(key);
Ht_item* item = table->items[index];
<^>LinkedList* head = table->overflow_buckets[index];<^>
// Provide only non-NULL values.
if (item != NULL)
{
if (strcmp(item->key, key) == 0)
return item->value;
<^>if (head == NULL)<^>
<^>return NULL;<^>
<^>item = head->item;<^>
<^>head = head->next;<^>
}
return NULL;
}
Finally, collisions are now handled in insert() and search()!
Deleting from the Hash Table
Let's now finally look at the delete function:
void ht_delete(HashTable* table, char* key)
{
...
}
Again, the method is similar to insertion.
- Compute the hash index and get the item.
- If it is
NULL, don't do anything. - Otherwise, after comparing keys, if there is no collision chain for that index, remove the item from the table.
- If a collision chain exists, remove that element and shift the links accordingly.
[label HashTable.cpp]
void ht_delete(HashTable* table, char* key)
{
// Deletes an item from the table.
int index = hash_function(key);
Ht_item* item = table->items[index];
LinkedList* head = table->overflow_buckets[index];
if (item == NULL)
{
// Does not exist.
return;
}
else {
if (head == NULL && strcmp(item->key, key) == 0)
{
// Collision chain does not exist.
// Remove the item.
// Set table index to NULL.
table->items[index] = NULL;
free_item(item);
table->count--;
return;
}
else if (head != NULL)
{
// Collision chain exists.
if (strcmp(item->key, key) == 0)
{
// Remove this item.
// Set the head of the list as the new item.
free_item(item);
LinkedList* node = head;
head = head->next;
node->next = NULL;
table->items[index] = create_item(node->item->key, node->item->value);
free_linkedlist(node);
table->overflow_buckets[index] = head;
return;
}
LinkedList* curr = head;
LinkedList* prev = NULL;
while (curr)
{
if (strcmp(curr->item->key, key) == 0)
{
if (prev == NULL)
{
// First element of the chain.
// Remove the chain.
free_linkedlist(head);
table->overflow_buckets[index] = NULL;
return;
}
else
{
// This is somewhere in the chain.
prev->next = curr->next;
curr->next = NULL;
free_linkedlist(curr);
table->overflow_buckets[index] = head;
return;
}
}
curr = curr->next;
prev = curr;
}
}
}
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CAPACITY 50000 // Size of the HashTable.
unsigned long hash_function(char *str)
{
unsigned long i = 0;
for (int j = 0; str[j]; j++)
i += str[j];
return i % CAPACITY;
}
// Defines the HashTable item.
typedef struct Ht_item
{
char *key;
char *value;
} Ht_item;
// Defines the LinkedList.
typedef struct LinkedList
{
Ht_item *item;
LinkedList *next;
} LinkedList;
// Defines the HashTable.
typedef struct HashTable
{
// Contains an array of pointers to items.
Ht_item **items;
LinkedList **overflow_buckets;
int size;
int count;
} HashTable;
LinkedList *allocate_list()
{
// Allocates memory for a LinkedList pointer.
LinkedList *list = (LinkedList *)malloc(sizeof(LinkedList));
return list;
}
LinkedList *linkedlist_insert(LinkedList *list, Ht_item *item)
{
// Inserts the item onto the LinkedList.
if (!list)
{
LinkedList *head = allocate_list();
head->item = item;
head->next = NULL;
list = head;
return list;
}
else if (list->next == NULL)
{
LinkedList *node = allocate_list();
node->item = item;
node->next = NULL;
list->next = node;
return list;
}
LinkedList *temp = list;
while (temp->next->next)
{
temp = temp->next;
}
LinkedList *node = allocate_list();
node->item = item;
node->next = NULL;
temp->next = node;
return list;
}
Ht_item *linkedlist_remove(LinkedList *list)
{
// Removes the head from the LinkedList.
// Returns the item of the popped element.
if (!list)
return NULL;
if (!list->next)
return NULL;
LinkedList *node = list->next;
LinkedList *temp = list;
temp->next = NULL;
list = node;
Ht_item *it = NULL;
memcpy(temp->item, it, sizeof(Ht_item));
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
return it;
}
void free_linkedlist(LinkedList *list)
{
LinkedList *temp = list;
while (list)
{
temp = list;
list = list->next;
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
}
}
LinkedList **create_overflow_buckets(HashTable *table)
{
// Create the overflow buckets; an array of LinkedLists.
LinkedList **buckets = (LinkedList **)calloc(table->size, sizeof(LinkedList *));
for (int i = 0; i < table->size; i++)
buckets[i] = NULL;
return buckets;
}
void free_overflow_buckets(HashTable *table)
{
// Free all the overflow bucket lists.
LinkedList **buckets = table->overflow_buckets;
for (int i = 0; i < table->size; i++)
free_linkedlist(buckets[i]);
free(buckets);
}
Ht_item *create_item(char *key, char *value)
{
// Creates a pointer to a new HashTable item.
Ht_item *item = (Ht_item *)malloc(sizeof(Ht_item));
item->key = (char *)malloc(strlen(key) + 1);
item->value = (char *)malloc(strlen(value) + 1);
strcpy(item->key, key);
strcpy(item->value, value);
return item;
}
HashTable *create_table(int size)
{
// Creates a new HashTable.
HashTable *table = (HashTable *)malloc(sizeof(HashTable));
table->size = size;
table->count = 0;
table->items = (Ht_item **)calloc(table->size, sizeof(Ht_item *));
for (int i = 0; i < table->size; i++)
table->items[i] = NULL;
table->overflow_buckets = create_overflow_buckets(table);
return table;
}
void free_item(Ht_item *item)
{
// Frees an item.
free(item->key);
free(item->value);
free(item);
}
void free_table(HashTable *table)
{
// Frees the table.
for (int i = 0; i < table->size; i++)
{
Ht_item *item = table->items[i];
if (item != NULL)
free_item(item);
}
// Free the overflow bucket lists and its items.
free_overflow_buckets(table);
free(table->items);
free(table);
}
void handle_collision(HashTable *table, unsigned long index, Ht_item *item)
{
LinkedList *head = table->overflow_buckets[index];
if (head == NULL)
{
// Creates the list.
head = allocate_list();
head->item = item;
table->overflow_buckets[index] = head;
return;
}
else
{
// Insert to the list.
table->overflow_buckets[index] = linkedlist_insert(head, item);
return;
}
}
void ht_insert(HashTable *table, char *key, char *value)
{
// Creates the item.
Ht_item *item = create_item(key, value);
// Computes the index.
int index = hash_function(key);
Ht_item *current_item = table->items[index];
if (current_item == NULL)
{
// Key does not exist.
if (table->count == table->size)
{
// HashTable is full.
printf("Insert Error: Hash Table is full\n");
free_item(item);
return;
}
// Insert directly.
table->items[index] = item;
table->count++;
}
else
{
// Scenario 1: Update the value.
if (strcmp(current_item->key, key) == 0)
{
strcpy(table->items[index]->value, value);
return;
}
else
{
// Scenario 2: Handle the collision.
handle_collision(table, index, item);
return;
}
}
}
char *ht_search(HashTable *table, char *key)
{
// Searches for the key in the HashTable.
// Returns NULL if it doesn't exist.
int index = hash_function(key);
Ht_item *item = table->items[index];
LinkedList *head = table->overflow_buckets[index];
// Provide only non-NULL values.
if (item != NULL)
{
if (strcmp(item->key, key) == 0)
return item->value;
if (head == NULL)
return NULL;
item = head->item;
head = head->next;
}
return NULL;
}
void ht_delete(HashTable *table, char *key)
{
// Deletes an item from the table.
int index = hash_function(key);
Ht_item *item = table->items[index];
LinkedList *head = table->overflow_buckets[index];
if (item == NULL)
{
// Does not exist.
return;
}
else
{
if (head == NULL && strcmp(item->key, key) == 0)
{
// Collision chain does not exist.
// Remove the item.
// Set table index to NULL.
table->items[index] = NULL;
free_item(item);
table->count--;
return;
}
else if (head != NULL)
{
// Collision chain exists.
if (strcmp(item->key, key) == 0)
{
// Remove this item.
// Set the head of the list as the new item.
free_item(item);
LinkedList *node = head;
head = head->next;
node->next = NULL;
table->items[index] = create_item(node->item->key, node->item->value);
free_linkedlist(node);
table->overflow_buckets[index] = head;
return;
}
LinkedList *curr = head;
LinkedList *prev = NULL;
while (curr)
{
if (strcmp(curr->item->key, key) == 0)
{
if (prev == NULL)
{
// First element of the chain.
// Remove the chain.
free_linkedlist(head);
table->overflow_buckets[index] = NULL;
return;
}
else
{
// This is somewhere in the chain.
prev->next = curr->next;
curr->next = NULL;
free_linkedlist(curr);
table->overflow_buckets[index] = head;
return;
}
}
curr = curr->next;
prev = curr;
}
}
}
}
void print_search(HashTable *table, char *key)
{
char *val;
if ((val = ht_search(table, key)) == NULL)
{
printf("Key:%s does not exist\n", key);
return;
}
else
{
printf("Key:%s, Value:%s\n", key, val);
}
}
void print_table(HashTable *table)
{
printf("\nHash Table\n-------------------\n");
for (int i = 0; i < table -> size; i++)
{
if (table -> items[i])
{
printf("Index:%d, Key:%s, Value:%s\n", i, table -> items[i] -> key, table -> items[i] -> value);
}
}
printf("-------------------\n\n");
}
int main()
{
HashTable *ht = create_table(CAPACITY);
ht_insert(ht, (char *)"1", (char *)"First address");
ht_insert(ht, (char *)"2", (char *)"Second address");
ht_insert(ht, (char *)"Hel", (char *)"Third address");
ht_insert(ht, (char *)"Cau", (char *)"Fourth address");
print_search(ht, (char *)"1");
print_search(ht, (char *)"2");
print_search(ht, (char *)"3");
print_search(ht, (char *)"Hel");
print_search(ht, (char *)"Cau"); // Collision!
print_table(ht);
ht_delete(ht, (char *)"1");
ht_delete(ht, (char *)"Cau");
print_table(ht);
free_table(ht);
return 0;
}
This code will produce the following output:
[secondary_label Output]
Key:1, Value:First address
Key:2, Value:Second address
Key:3 does not exist
Key:Hel, Value:Third address
Key:Cau does not exist
Hash Table
-------------------
Index:49, Key:1, Value:First address
Index:50, Key:2, Value:Second address
Index:281, Key:Hel, Value:Third address
-------------------
Hash Table
-------------------
Index:50, Key:2, Value:Second address
Index:281, Key:Hel, Value:Third address
-------------------
ht_insert(), ht_search(), and ht_delete behave as expected.
Conclusion
In this article, you implemented a hash table from scratch in C/C++.
Experiment with other collision handling algorithms and different hash functions. Continue your learning with more C++ tutorials
<!–
Download the Code
You can get the code through a GitHub Gist that I've uploaded. Feel free to ask any questions or provide suggestions in the comment section below! –>