Hashing

O(1) Access with Hash Tables

Hashing provides a way to map data of any size to a fixed-size value, enabling average O(1) time complexity for lookups, insertions, and deletions. It's one of the most important tools for optimizing algorithms.

1. The Core Idea

Imagine you have a huge library but no card catalog. To find a book, you'd have to search shelf by shelf (an O(n) operation). Hashing is like creating a magic catalog. You give it a book title (a key), and a hash function instantly tells you which shelf (a bucket or slot) to look on.

  • Hash Function: A function that takes a key and computes an index into an array. A good hash function should be fast and distribute keys uniformly across the array. For integers, a simple hash function is key % array_size.
  • Hash Table: The underlying array that stores the data. Also known as a HashMap, Dictionary, or Associative Array.
  • Complexity: On average, insertion, deletion, and search are all O(1) operations. This is what makes hash tables so powerful.

2. Hash Collisions

What if the hash function tells two different book titles to go on the same shelf? This is a collision. Since the number of possible keys is often much larger than the number of buckets, collisions are inevitable. How we handle them is crucial.

Separate Chaining

This is a very common strategy. Instead of storing a single item in each bucket, we store a linked list of all the items that hashed to that index. When a collision occurs, we just add the new item to the list.

Hash Table with Separate Chaining

Hash Table (Array)
  • Index 0: -> [KeyA, ValA] -> [KeyD, ValD]
  • Index 1: -> [KeyC, ValC]
  • Index 2: Empty
  • Index 3: -> [KeyB, ValB]

Here, KeyA and KeyD had a collision and are stored in a list at Index 0.

Open Addressing

In this method, all elements are stored directly in the hash table itself. When a collision occurs, the algorithm probes for the next empty slot.

  • Insert(k): Keep probing until an empty slot is found. Once found, insert the key.
  • Search(k): Keep probing until the slot's key matches k or an empty slot is reached (which means the key is not in the table).
  • Delete(k): Deleting can be tricky. If we simply empty the slot, it might break a probing chain for another key. Instead, we mark the slot as "deleted". The search operation will pass over deleted slots, but the insert operation can reuse them.

The interactive simulation below uses Linear Probing, where the "next available slot" is simply the next index in the array (wrapping around if necessary). While simple, this can lead to "clustering," where long runs of occupied slots build up, degrading performance. Other methods like Quadratic Probing or Double Hashing can mitigate this.

3. Classic Problems

Two Sum

Given an array of integers, return indices of the two numbers such that they add up to a specific target. A hash map provides an O(n) solution. Iterate through the array once. For each element x, check if target - x exists in the map. If it does, you've found your pair. If not, add x and its index to the map.

from typing import List

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        num_map = {} # val -> index
        for i, n in enumerate(nums):
            diff = target - n
            if diff in num_map:
                return [num_map[diff], i]
            num_map[n] = i
Group Anagrams

Given an array of strings, group the anagrams together. The key insight is that all anagrams of a word become identical when their characters are sorted. We can use a hash map where the key is the sorted version of a word and the value is a list of its anagrams.

from typing import List
import collections

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        ans = collections.defaultdict(list)
        for s in strs:
            # Sorted string is the key
            key = tuple(sorted(s))
            ans[key].append(s)
        return list(ans.values())
Longest Consecutive Sequence

Given an unsorted array of integers, find the length of the longest consecutive elements sequence. An O(n) solution can be achieved using a hash set. First, add all numbers to a set for O(1) lookups. Then, iterate through the numbers. If a number n is the start of a sequence (i.e., n-1 is not in the set), start counting upwards until you no longer find consecutive numbers in the set.

from typing import List

class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        num_set = set(nums)
        longest_streak = 0
        
        for n in num_set:
            # Check if it's the start of a sequence
            if (n - 1) not in num_set:
                current_num = n
                current_streak = 1
                
                while (current_num + 1) in num_set:
                    current_num += 1
                    current_streak += 1
                
                longest_streak = max(longest_streak, current_streak)
                
        return longest_streak

Interactive Hash Table

This simulation visualizes a hash table using Linear Probing to handle collisions. Enter a number and click Insert or Search to see how it works. The hash function is key % 11.

0
1
2
3
4
5
6
7
8
9
10

Enter a key and choose an operation.