Strings
Manipulating and Searching Text
Strings are sequences of characters and one of the most common data types in programming. This module covers essential string manipulation techniques and advanced algorithms for efficient pattern matching.
1. Basic Operations & Concepts
Strings seem simple, but mastering their nuances is key.
Valid Anagram
An anagram is a word formed by rearranging the letters of another. The classic way to check if two strings are anagrams is to sort them both and see if they are equal. A more efficient O(n) approach uses a frequency map (an array or hash map) to count the occurrences of each character.
class Solution:
def isAnagram(self, s: str, t: str) -> bool:
if len(s) != len(t):
return False
countS, countT = {}, {}
for i in range(len(s)):
countS[s[i]] = 1 + countS.get(s[i], 0)
countT[t[i]] = 1 + countT.get(t[i], 0)
return countS == countT
Valid Palindrome
A palindrome is a string that reads the same forwards and backwards. The most efficient way to check this is using a two-pointer approach. Place one pointer at the beginning and one at the end, and move them inwards, comparing characters and skipping non-alphanumeric ones.
class Solution:
def isPalindrome(self, s: str) -> bool:
left, right = 0, len(s) - 1
while left < right:
while left < right and not s[left].isalnum():
left += 1
while left < right and not s[right].isalnum():
right -= 1
if s[left].lower() != s[right].lower():
return False
left += 1
right -= 1
return True
2. Advanced Pattern Matching
Finding a substring (pattern) within a larger string (text) is a frequent problem. While the naive approach of checking every possible starting position has a time complexity of O(n*m), several advanced algorithms offer linear time solutions.
Rabin-Karp Algorithm (String Hashing)
The Rabin-Karp algorithm uses hashing to find a pattern in text. Instead of comparing substrings character by character, it compares their hash values. The key is the "rolling hash," which allows the hash of the next substring to be calculated in O(1) time from the previous hash, avoiding re-computation.
Text: "abacaba" Pattern: "aba"
Hash "aba" -> H_pattern
Hash first window "aba" -> H_text
Compare H_pattern and H_text. Match!
Roll hash to next window "bac":
- Subtract 'a' from H_text
- Shift hash
- Add 'c' to H_text
Compare hashes... and so on.
Knuth-Morris-Pratt (KMP) Algorithm
KMP achieves O(n+m) complexity by cleverly preprocessing the pattern. It creates a "Longest Prefix Suffix" (LPS) array. This array stores the length of the longest proper prefix of the pattern that is also a suffix. When a mismatch occurs during searching, the LPS array tells you exactly how many characters to shift the pattern forward, avoiding redundant comparisons of characters that are already known to match.
Pattern: "aabaaca" LPS Array: [0, 1, 0, 1, 2, 0, 1]
How lps[4]=2 is calculated for "aabaa": Prefixes: "a", "aa", "aab", "aaba" Suffixes: "a", "aa", "baa", "abaa" Longest common prefix/suffix is "aa", length 2.
Classic Problem: Find the Index of the First Occurrence in a String
This is the quintessential pattern matching problem. While solvable with simple built-in functions, implementing it with an algorithm like KMP is a common interview task to test deep understanding.
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
if not needle:
return 0
# Naive implementation
for i in range(len(haystack) + 1 - len(needle)):
if haystack[i : i + len(needle)] == needle:
return i
return -1
Interactive Palindrome Checker
Visualize the two-pointer technique for checking if a string is a palindrome.
Enter a string and press Start.
Interactive KMP Algorithm
Visualize the Knuth-Morris-Pratt pattern matching algorithm and its LPS table.
LPS Array for Pattern 'ababd'
Text-Pattern Alignment
Press Start to visualize KMP.