Hashing
O(1) Access with Hash Tables
Hashing provides a way to map data of any size to a fixed-size value, enabling average O(1) time complexity for lookups, insertions, and deletions. It's one of the most important tools for optimizing algorithms.
1. The Core Idea
Imagine you have a huge library but no card catalog. To find a book, you'd have to search shelf by shelf (an O(n) operation). Hashing is like creating a magic catalog. You give it a book title (a key), and a hash function instantly tells you which shelf (a bucket or slot) to look on.
- Hash Function: A function that takes a key and computes an index into an array. A good hash function should be fast and distribute keys uniformly across the array. For integers, a simple hash function is
key % array_size
. - Hash Table: The underlying array that stores the data. Also known as a HashMap, Dictionary, or Associative Array.
- Complexity: On average, insertion, deletion, and search are all O(1) operations. This is what makes hash tables so powerful.
2. Hash Collisions
What if the hash function tells two different book titles to go on the same shelf? This is a collision. Since the number of possible keys is often much larger than the number of buckets, collisions are inevitable. How we handle them is crucial.
Separate Chaining
This is a very common strategy. Instead of storing a single item in each bucket, we store a linked list of all the items that hashed to that index. When a collision occurs, we just add the new item to the list.
Hash Table with Separate Chaining
Hash Table (Array)
- Index 0: -> [KeyA, ValA] -> [KeyD, ValD]
- Index 1: -> [KeyC, ValC]
- Index 2: Empty
- Index 3: -> [KeyB, ValB]
Here, KeyA and KeyD had a collision and are stored in a list at Index 0.
Open Addressing
In this method, all elements are stored directly in the hash table itself. When a collision occurs, the algorithm probes for the next empty slot.
- Insert(k): Keep probing until an empty slot is found. Once found, insert the key.
- Search(k): Keep probing until the slot's key matches k or an empty slot is reached (which means the key is not in the table).
- Delete(k): Deleting can be tricky. If we simply empty the slot, it might break a probing chain for another key. Instead, we mark the slot as "deleted". The search operation will pass over deleted slots, but the insert operation can reuse them.
The interactive simulation below uses Linear Probing, where the "next available slot" is simply the next index in the array (wrapping around if necessary). While simple, this can lead to "clustering," where long runs of occupied slots build up, degrading performance. Other methods like Quadratic Probing or Double Hashing can mitigate this.
3. Classic Problems
Two Sum
Given an array of integers, return indices of the two numbers such that they add up to a specific target. A hash map provides an O(n) solution. Iterate through the array once. For each element x
, check if target - x
exists in the map. If it does, you've found your pair. If not, add x
and its index to the map.
from typing import List
class Solution:
def twoSum(self, nums: List[int], target: int) -> List[int]:
num_map = {} # val -> index
for i, n in enumerate(nums):
diff = target - n
if diff in num_map:
return [num_map[diff], i]
num_map[n] = i
Group Anagrams
Given an array of strings, group the anagrams together. The key insight is that all anagrams of a word become identical when their characters are sorted. We can use a hash map where the key is the sorted version of a word and the value is a list of its anagrams.
from typing import List
import collections
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
ans = collections.defaultdict(list)
for s in strs:
# Sorted string is the key
key = tuple(sorted(s))
ans[key].append(s)
return list(ans.values())
Longest Consecutive Sequence
Given an unsorted array of integers, find the length of the longest consecutive elements sequence. An O(n) solution can be achieved using a hash set. First, add all numbers to a set for O(1) lookups. Then, iterate through the numbers. If a number n
is the start of a sequence (i.e., n-1
is not in the set), start counting upwards until you no longer find consecutive numbers in the set.
from typing import List
class Solution:
def longestConsecutive(self, nums: List[int]) -> int:
num_set = set(nums)
longest_streak = 0
for n in num_set:
# Check if it's the start of a sequence
if (n - 1) not in num_set:
current_num = n
current_streak = 1
while (current_num + 1) in num_set:
current_num += 1
current_streak += 1
longest_streak = max(longest_streak, current_streak)
return longest_streak
Interactive Hash Table
This simulation visualizes a hash table using Linear Probing to handle collisions. Enter a number and click Insert or Search to see how it works. The hash function is key % 11
.
Enter a key and choose an operation.