Tries (Prefix Trees)

Tries (Prefix Trees)

A Trie, also known as a prefix tree or digital tree, is a tree-like data structure that proves to be very efficient for solving problems related to strings, especially for retrieval based on prefixes.

Structure of a Trie

A trie is a tree where each node represents a single character of a string.

  • The root node is typically empty.
  • Each node has an array of pointers (or a hash map), one for each possible character (e.g., 26 for lowercase English letters).
  • Each node might also have a boolean flag (e.g., isEndOfWord) to indicate if the path to this node represents a complete word in the set.

Unlike a binary search tree, nodes in the trie do not store their corresponding key. Instead, the node's position in the tree defines the key with which it is associated. All descendants of a node have a common prefix of the string associated with that node.

Core Operations

Let L be the length of the string being processed.

1. Insertion:

To insert a string, we traverse the trie from the root, character by character. For each character, we follow the corresponding pointer. If a pointer for a character does not exist, we create a new node. After processing the last character, we mark the final node's isEndOfWord flag as true.

  • Complexity: O(L)
2. Search:

To search for a string, we traverse the trie from the root, following the pointers corresponding to the characters of the string. If we can follow the path for the entire string and the final node has its isEndOfWord flag set to true, the string exists in the trie.

  • Complexity: O(L)
3. Prefix Search (startsWith):

To check if any word starts with a given prefix, we traverse the trie similar to a search. If we can successfully traverse the trie for the entire prefix, it means there is at least one word in the trie with that prefix.

  • Complexity: O(L)

Advantages and Applications

Advantages over Hash Tables for strings:

  • Tries can find all keys with a common prefix efficiently.
  • Tries can provide an alphabetical ordering of the entries by key.
  • There are no hash collisions to handle.

Applications:

  • Autocomplete / Predictive Text: Suggesting words as a user types.
  • Spell Checkers: Finding words that are similar to a misspelled word.
  • IP Routing: Longest prefix matching.
  • Used in bioinformatics to search for DNA sequences.