Introduction to Hash Table

Introduction to Hash Table

10 mins read281 Views Comment
Updated on Mar 8, 2024 15:28 IST

Have you ever wondered how a data structure can efficiently store and retrieve data with minimal time complexity? A hash table is a data structure that accomplishes this by using a hash function to convert keys into indices of an array. This process allows for fast data lookup, insertion, and deletion, making hash tables incredibly efficient for certain types of data operations. Let's understand more!

2023_04_MicrosoftTeams-image-24.jpg

A hash table is a data structure that uses key-value pairs to store data associatively. The keys are mapped to their respective indices in an array using a hash function, and the values are then placed in the array. The hash function accepts a key as input and returns an index in an array where the value can be located. Hash tables are frequently used to build lookup tables, associative arrays, and dynamic sets because they offer constant-time average-case lookup, insertion, and deletion operations.

Table of Content

Recommended online courses

Best-suited Data Structures and Algorithms courses for you

Learn Data Structures and Algorithms with these high-rated online courses

– / –
4 months
– / –
16 weeks
Free
– / –
– / –
– / –
– / –
6 months
– / –
150 hours
– / –
4 months
– / –
8 weeks
– / –
12 weeks
β‚Ή4.24 K
6 weeks

What is a Hash Table?

A hash table is a type of data structure used to store information. The information is composed mostly of keys and value. An associative array can be used to implement a hash table. The effectiveness of the hash function used for mapping determines how efficient the mapping process is. A technique called hashing turns a range of key values into a range of array indexes. The modulo operator will be used to obtain a range of key values.

Prerequisites for Learning Hash Table

  1. Basic data structures are crucial because they are the foundation for more complex data structures like hash tables. These structures include arrays, linked lists, and trees.
  2. You should have a basic understanding of algorithms and be able to evaluate their time and space complexity. You will better understand hash table performance characteristics and how to select a suitable hash function as a result.
  3. Since hash tables are frequently implemented using programming languages like C++, Java, or Python, you should be familiar with them.
  4. You should have a fundamental understanding of hash functions and how they operate. A mathematical operation known as a hash function transforms a key into an array index. Using a decent hash function is crucial for the hash table to perform well.
  5. You should be aware of collisions when two keys are assigned to the same array index. Implementing hash tables requires an understanding of how to handle collisions.

Ultimately, a firm grasp of these subjects will provide a solid foundation for understanding hash tables and how they work.

Basic Operations in Hash Table

The fundamental principle operations of a hash table are listed below.

  • Insertion: This operation adds A fresh key-value pair to the hash table. To identify where the value should be kept in the hash table, the key is hashed. A collision happens if another key-value pair already hold the position, and the hash table must deal with it.
  • Search: This operation entails finding a key in the hash table and obtaining the value that goes with it. The key is hashed to identify where the value should be kept in the hash table. The hash table must determine whether the key matches the sought key if another key-value combination already holds the place and if not, it must deal with the collision.
  • Deletion: In this operation, a key-value pair is taken out of the hash table. To identify where the value should be kept in the hash table, the key is hashed. The key-value pair to be erased is eliminated from the hash table if it occupies the location.

 
#include <iostream>
#include <list>
#include <string>
#include <algorithm>
using namespace std;
// Define a hash function to map keys to indices in the array
int hashFunction(const string& key, int size) {
int hash = 0;
for (char ch : key) {
hash = (31 * hash + ch) % size;
}
return hash;
}
// Define a class for the hash table
class HashTable {
public:
// Constructor to initialize the table with a given size
HashTable(int size) : size_(size) {
table_ = new list<pair<string, int>>[size_];
}
// Destructor to free the memory allocated for the table
~HashTable() {
delete[] table_;
}
// Insert a new key-value pair into the table
void insert(const string& key, int value) {
int index = hashFunction(key, size_);
table_[index].push_back(make_pair(key, value));
}
// Search for the value associated with a given key in the table
int search(const string& key) {
int index = hashFunction(key, size_);
for (const auto& p : table_[index]) {
if (p.first == key) {
return p.second;
}
}
return -1; // Key not found
}
// Delete the key-value pair associated with a given key from the table
void remove(const string& key) {
int index = hashFunction(key, size_);
auto& chain = table_[index];
for (auto it = chain.begin(); it != chain.end(); ++it) {
if (it->first == key) {
chain.erase(it);
break;
}
}
}
private:
int size_;
list<pair<string, int>>* table_;
};
// Test the hash table implementation
int main() {
HashTable table(10);
table.insert("mango", 1);
table.insert("apple", 2);
table.insert("orange", 3);
cout << "Value of mango: " << table.search("mango") << endl;
cout << "Value of apple: " << table.search("apple") << endl;
cout << "Value of orange: " << table.search("orange") << endl;
table.remove("apple");
cout << "Value of apple after removal: " << table.search("apple") << endl;
return 0;
}
Copy code

Output

Value of mango: 1
Value of apple: 2
Value of orange: 3
Value of apple after removal: -1

Understanding Hashing

One search method that requires a consistent amount of time is hashing. Hashing has a time complexity of O. (1). Until now, we have read about linear and binary search, two different search methods. The worst time complexity for binary and linear searches is O(logn). Both searching methods depend on the number of components, but we prefer a method that uses a fixed amount of time. Hence, a hashing approach that offers a constant time emerged.

Methods of Calculating a Hash Function

1. Division method

The hash function in the division technique can be defined as:

h(k) = k%m;

where m is the hash table’s size and k is a key value.

For instance, suppose the hash table’s size is 10, and the key value is 15. When key 15 is subjected to the hash function, the following index results:

h(15) = 15 = 5

The value(5) is kept at index 5.

2. Folding method:

A key is divided into smaller bits using the folding mechanism of a hash function, and the resulting hash value is then added. Systems that handle large keys, such as social security or phone numbers, frequently employ the folding method. Here is an illustration of how the folding method functions:

Take the key β€œ1234567890”, for example. By splitting the key into two or more pieces of equal size and adding them together, we can generate a hash value using the folding method.

In this example, the key will be divided into two equal portions with five digits each. These are β€œ12345” and β€œ67890,” respectively. Afterwards, we combine these two parts.

                   12345 + 67890 = 80235

3. Mid-Square method

The key is squared when using the mid-square hash function, which then chooses a chunk of the middle digits of the result as the hash value. Here is an illustration of how the mid-square technique functions:

Assume we possess the key β€œ42”. We’ll square the key first, then use the mid-square method to generate a hash value:

42^2 = 1764

The hash value is then chosen as a subset of the middle digits of the outcome. Take the two middle digits, for instance:

17|64

                   In this case, the hash value would be 64.

Advantages of Using Hash Function

  1. Quick access: Based on their keys, hash tables give users quick access to items. This is because hash tables, which enable constant-time access to elements, use a hash function to map keys to indices in an array or a linked list.
  2. Hash tables are effective for searches and insertions because their average case time complexity is O(1). Because of this, they are effective data structures for scenarios in which many actions on a dataset are required.
  3. Space efficiency: Unlike arrays or linked lists, which consume space proportionally to their size, hash tables only utilize space proportional to the number of elements they contain.
  4. Flexibility: Any type of data, including texts, integers, and objects, can be stored using hash tables. They can also change size dynamically, making good use of the memory.
  5. Resolution of collisions: When numerous keys map to the same index in a hash table, collisions can be handled efficiently using independent chaining or open addressing.

Disadvantages of Using Hash Function

Although using hash tables offers numerous benefits, there are some drawbacks as well:

  1. Collisions: When two or more keys are hashed to the same index in the table, a collision takes place. Collisions are unavoidable in a hash table with a significant number of keys. Collisions can be resolved using methods like chaining or open addressing, although doing so can complicate implementation and impact the speed of the hash table.
  2. Memory requirements: Hash tables can use a lot of memory, especially if the table needs to hold many keys or if the keys themselves are big. Moreover, some collision resolution methods, like open addressing, may need extra memory to store supplemental data structures.
  3. Performance loss: If the load factor (the ratio of the number of keys to the size of the table) rises too high, hash tables may experience performance loss. When the load factor is large, collisions happen more frequently, and hash table operations can take much longer to complete.
  4. Unordered data structure: Hash tables are naturally unordered data structures. Therefore the order of the keys within the table may not correspond to the order in which they were introduced. Even though this isn’t usually a problem, it can be not very pleasant when the keys’ order is crucial.
  5. Hash function selection: Choosing a good hash function is crucial for a hash table’s performance because it translates keys into table indices. Hash functions that are poorly constructed may experience a lot of collisions and need better performance.

Applications of Hash Table

In computer science, hash functions are used for a variety of purposes, including:

  • Hash table: Hash tables are one of the most popular uses for hash functions. Hash functions are used to map keys to indices in an array, allowing for quick key-value pair retrieval, insertion, and deletion.
  • Cryptography: Modern cryptography relies heavily on hash functions. Using cryptographic hash functions, a fixed-length message digest (or hash) is produced from variable-length input data. Data integrity and digital signature authenticity are both checked using these digests.
  • Security: Hash functions are frequently used to store passwords securely. Passwords are hashed using a salted hash function rather than being stored in plaintext. Even if an attacker manages to access the hash, it becomes more challenging for them to recover the original password.
  • Data fingerprinting: Hash functions can be used to create distinctive data fingerprints. These fingerprints can be used to find duplicate data or to check the consistency of transferred or stored data.
  • Indexing and searching: Hash functions can be used to index and search big datasets. It is feasible to easily search for and retrieve objects that meet a particular set of requirements by creating hashes for each item in the collection.
  • Load balancing: Hash functions can be used for load balancing in distributed systems to divide the workload among many servers. It is possible to spread workload equitably and prevent overtaxing any particular server by hashing incoming requests and utilizing the hash to identify which server should handle the request.

Conclusion

Hash tables are a crucial data structure in computer science because they make data storage, retrieval, and modification quick and easy. They enable constant-time access to key-value pairs by using hash functions to map keys to indices in an array. Database management systems, search engines, and data processing pipelines are just a few of the many uses for hash tables. Hash tables are still frequently used and appreciated for their effectiveness and usability despite several drawbacks, such as the chance for collisions and the requirement to resize the table when it fills up too much.

Contributed by-Kirti

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio