Hash Table
Last updated
Last updated
💡 A hash table (aka hash map) is a data structure designed for efficient storage and retrieval of key-value pairs. A hash table uses a hash function that takes a key (which can be any data type) to compute an index by converting it into a unique number which is called a hash code.
The hash code will be allocated into an array of buckets or slots, from which the desired value can be found. Basically, during lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored in the hash table.
Under the hood, a hash table uses a dynamic array of linked lists to efficiently store key/value
pairs.
When inserting a key/value
pair, a hash function first maps the key, which is typically a string (or any data that can be hashed, depending on the implementation of the hash table), to an integer value and, by extension, to an index in the underlying dynamic array.
Then, the value associated with the key is added to the linked list stored at that index in the dynamic array, and a reference to the key is also stored with the value.
💡 In a hash table, a new index is processed using the keys. And, the element corresponding to that key is stored in the index. This process is called hashing.
Hash tables rely on highly optimized hash functions to minimize the number of collisions that occur when storing values: cases where 2 keys map to the same index.
Techniques like separate chaining or open addressing are used to handle collisions effectively.
Chaining: Each bucket points to a list of entries that hash to the same index.
Open Addressing: All elements are stored within the array itself. When a collision occurs, the algorithm searches for the next available slot according to a probing sequence (linear probing, quadratic probing, or double hashing).
Here are some key characteristics of hash tables:
Super fast search: Finding a value through a hash table is much faster than traditional methods like iterating through a list, especially for large datasets.
Dynamic size: Hash tables can grow and shrink easily as needed.
Efficient insertion and deletion: Adding or removing data is also quick and convenient.
However, there are also some trade-offs with hash tables:
Hash collisions: Different keys can sometimes generate the same index (collision). Resolving these collisions adds complexity and can impact performance.
Not the best for ordered data: Hash tables don't inherently maintain the order of data.
The hash function includes 4 main components:
Keys: Unique identifiers for data you want to store.
Values: The actual data associated with each key.
Hash Function: A mathematical formula that converts a key into a unique index.
Array: Stores the key-value
pairs in this index.
Input Processing: The hash function takes the key as input. The key can be of any data type (e.g., integers, strings).
Conversion to Integer: The hash function applies a mathematical formula to the key. This formula is designed to spread out the keys evenly across a fixed-size range of numbers (hash codes). The goal is to convert the keys to an integer.
Compression to Index Range: The integer value is then compressed into the range of indices available in the hash table’s array. This is often done using the modulus operation (%
), which ensures that the result is within the bounds of the array size.
Insertion
Deletion
Search
Access
Factors Affecting Performance:
Good hash function: A well-designed hash function minimizes collisions, ensuring better performance.
Load factor: The ratio of keys to array size affects collision likelihood. Keeping a low load factor is generally beneficial.
Collision resolution: The chosen strategy for handling collisions also impacts performance.
Common use cases of hash table include:
Databases: Hash tables are essential for indexing data in databases, enabling efficient searches.
Caches: They're used to store frequently accessed data in memory for faster retrieval.
Web browsers: Browsers use hash tables to store cookies and manage web sessions.
Networking: Routing protocols sometimes use hash tables to efficiently direct data packets.