Trie (Prefix Tree)
The Ultimate String Data Structure
A Trie, also known as a prefix tree or digital tree, is a tree-based data structure that proves to be very efficient for solving problems related to strings, especially for retrieval based on prefixes. Instead of storing entire keys in nodes, a node's position in the trie represents the key itself. This structure is the engine behind autocomplete and spell-checker features.
1. Anatomy of a Trie Node
Unlike a binary tree node, a Trie node doesn't store the character it represents. Instead, its position in the tree does. A typical Trie node contains:
- Children Array/Map: An array of pointers (or a hash map) to its children nodes. For lowercase English letters, this is often an array of size 26.
children['c']
would point to the node representing 'c'. - End of Word Flag: A boolean flag (e.g.,
isEndOfWord
) to mark if the path from the root to this node forms a complete word.
class TrieNode:
def __init__(self):
self.children = {} # map character to TrieNode
self.isEndOfWord = False
2. Core Operations
Let L be the length of the string being processed.
Insertion
To insert a word, we traverse the trie from the root, character by character. If a path for a character doesn't exist, we create a new node. After the last character, we set the final node's isEndOfWord
flag to true.
Inserting "TEA" and "TED"
(root) | T | E / \ A* D*
* denotes isEndOfWord = true
Time: O(L) | Space: O(L * alphabet_size) in worst case
Search
To search for a word, we traverse the trie. If we can follow the path for the entire word and the final node's isEndOfWord
flag is true, the word exists.
Time: O(L)
Prefix Search (startsWith)
To check if any word starts with a given prefix, we traverse the trie. If we can successfully traverse the trie for the entire prefix, it means there is at least one word in the trie with that prefix.
Time: O(L)
3. Classic Problems
Implement Trie (Prefix Tree)
This problem is a direct implementation of the core Trie data structure and its three main methods: insert
, search
, and startsWith
. It's the best way to solidify your understanding.
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
curr = self.root
for c in word:
if c not in curr.children:
curr.children[c] = TrieNode()
curr = curr.children[c]
curr.isEndOfWord = true
def search(self, word: str) -> bool:
curr = self.root
for c in word:
if c not in curr.children:
return False
curr = curr.children[c]
return curr.isEndOfWord
def startsWith(self, prefix: str) -> bool:
curr = self.root
for c in prefix:
if c not in curr.children:
return False
curr = curr.children[c]
return True
Word Search II
Given an m x n grid of characters and a list of words, find all words on the grid. This is a famous hard problem that combines Trie and backtracking (DFS). First, you insert all words into a Trie. Then, you perform a DFS from every cell in the grid. As you traverse the grid, you also traverse the Trie. If you find a path in the grid that spells a word in the Trie (marked by isEndOfWord
), you add it to your results. This is far more efficient than searching for each word individually.
class Solution:
def findWords(self, board: List[List[str]], words: List[str]) -> List[str]:
root = TrieNode()
for w in words:
curr = root
for c in w:
if c not in curr.children:
curr.children[c] = TrieNode()
curr = curr.children[c]
curr.isEndOfWord = True
curr.word = w # Store the word in the node
ROWS, COLS = len(board), len(board[0])
res, visit = set(), set()
def dfs(r, c, node):
if (r < 0 or c < 0 or r == ROWS or c == COLS or
(r, c) in visit or board[r][c] not in node.children):
return
visit.add((r, c))
node = node.children[board[r][c]]
if node.isEndOfWord:
res.add(node.word)
dfs(r + 1, c, node)
dfs(r - 1, c, node)
dfs(r, c + 1, node)
dfs(r, c - 1, node)
visit.remove((r, c))
for r in range(ROWS):
for c in range(COLS):
dfs(r, c, root)
return list(res)
Interactive Trie
Visualize a Trie data structure. Insert words and see how the tree is built. Search for words or prefixes.