Graphs
Modeling a World of Connections
Graphs are the ultimate data structure for modeling networks and relationships. A graph consists of a set of vertices (or nodes) and a set of edges that connect pairs of vertices. This module covers everything from basic representation and traversal to the most essential graph algorithms.
1. Graph Representation
How you store a graph in memory is crucial for an algorithm's efficiency. The two most common ways are:
Adjacency List
This is the most common representation. It's an array of lists, where the list at index i
contains all the vertices adjacent to vertex i
. For weighted graphs, this list can store pairs of (neighbor, weight).
Pros: Space-efficient for sparse graphs (where |E| is much smaller than |V|²). Iterating over neighbors is fast.
Cons: Checking if an edge exists between two vertices takes O(k) time, where k is the number of neighbors.
# Adjacency list using a dictionary
adj_list = {
0: [1, 2],
1: [0, 3],
2: [0],
3: [1]
}
Adjacency Matrix
A V x V matrix where matrix[i][j]
is 1 (or the edge weight) if there's an edge from vertex i
to j
, and 0 otherwise.
Pros: Checking for a specific edge is O(1).
Cons: Requires O(V²) space, which is inefficient for sparse graphs.
# Adjacency matrix using a list of lists
V = 4
adj_matrix = [[0] * V for _ in range(V)]
# For an edge between 0 and 1
adj_matrix[0][1] = 1
adj_matrix[1][0] = 1
2. Graph Traversal
Traversal means visiting every vertex and edge exactly once. Use a visited
set or array to avoid getting stuck in cycles.
Depth-First Search (DFS)
DFS explores as far as possible down each branch before backtracking. It's like navigating a maze by always turning right at every junction until you hit a dead end, then backtracking. It typically uses a Stack (often implicitly via recursion).
from collections import defaultdict
# Graph represented as an adjacency list
graph = defaultdict(list)
def add_edge(u, v):
graph[u].append(v)
def dfs_util(v, visited):
visited.add(v)
print(v, end=' ')
for neighbour in graph[v]:
if neighbour not in visited:
dfs_util(neighbour, visited)
def dfs(start_node):
visited = set()
dfs_util(start_node, visited)
# Example usage:
# add_edge(0, 1)
# add_edge(0, 2)
# ...
# dfs(0)
Breadth-First Search (BFS)
BFS explores the graph layer by layer from a starting vertex. It's like the expanding ripples from a stone dropped in water. It uses a Queue. A key property of BFS is that it finds the shortest path in terms of the number of edges in an unweighted graph.
from collections import defaultdict, deque
# Graph represented as an adjacency list
graph = defaultdict(list)
def add_edge(u, v):
graph[u].append(v)
def bfs(start_node):
visited = {start_node}
queue = deque([start_node])
while queue:
node = queue.popleft()
print(node, end=' ')
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
3. Minimum Spanning Tree (MST)
A Minimum Spanning Tree (MST) is a subset of the edges of a connected, edge-weighted, undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. Imagine you need to connect several houses with water pipes; the MST finds the cheapest way to lay the pipes to connect all houses.
Example Graph and its MST
Original Graph (Edges with weights)
Minimum Spanning Tree (Total Weight = 37)
Kruskal's Algorithm
Kruskal's algorithm is a greedy approach to finding an MST. It treats the graph as a collection of individual nodes and progressively merges them by adding the "safest" edge, which is the one with the lowest weight that doesn't form a cycle.
- Sort Edges: Create a list of all edges and sort them by weight in non-decreasing order.
- Iterate and Union: Iterate through the sorted edges. For each edge (u, v), check if adding it will form a cycle. This is done efficiently using a Disjoint Set Union (DSU) data structure. If vertices u and v are not already in the same component (
find(u) != find(v)
), add the edge to the MST and perform aunion(u, v)
operation. - Stop: Continue until the MST has V-1 edges (where V is the number of vertices).
Time Complexity: O(E log E) or O(E log V), dominated by sorting the edges.
# Assume DSU class from previous module exists
def kruskals_mst(edges, V):
# edges is a list of tuples: (u, v, weight)
edges.sort(key=lambda item: item[2]) # Sort by weight
dsu = DSU(V)
mst_weight = 0
mst_edges = []
for u, v, weight in edges:
if dsu.union(u, v):
mst_weight += weight
mst_edges.append((u, v, weight))
return mst_weight, mst_edges
Prim's Algorithm
Prim's algorithm is another greedy approach that grows the MST from a single arbitrary starting vertex. It continuously adds the cheapest edge that connects a vertex inside the growing MST to a vertex outside of it.
- Initialization: Start with an empty MST. Pick a starting vertex and add all its edges to a Priority Queue. Mark the starting vertex as visited.
- Grow the Tree: While the priority queue is not empty, extract the edge with the minimum weight. If the edge connects to an unvisited vertex, add the edge to the MST, mark the new vertex as visited, and add all of its outgoing edges to the priority queue.
Time Complexity: O(E log V) using a binary heap-based Priority Queue.
import heapq
def prims_mst(graph, start_node):
# graph is an adjacency list: {u: {v: weight, ...}}
mst_weight = 0
visited = {start_node}
# (weight, from_node, to_node)
min_heap = [(weight, start_node, to_node) for to_node, weight in graph[start_node].items()]
heapq.heapify(min_heap)
while min_heap and len(visited) < len(graph):
weight, u, v = heapq.heappop(min_heap)
if v not in visited:
visited.add(v)
mst_weight += weight
for neighbor, neighbor_weight in graph[v].items():
if neighbor not in visited:
heapq.heappush(min_heap, (neighbor_weight, v, neighbor))
return mst_weight
4. Shortest Path Algorithms
Dijkstra's Algorithm
Dijkstra's finds the shortest path from a single source vertex to all other vertices in a weighted graph with non-negative edge weights. It works by maintaining a set of visited vertices and, for every unvisited vertex, the smallest known distance from the source. It's a greedy algorithm that always explores the "closest" unvisited vertex.
- Initialization: Set the distance to the source node as 0 and all other distances to infinity. Add the source node to a priority queue, prioritized by distance.
- Process Nodes: While the priority queue is not empty, extract the node `u` with the smallest distance.
- Relax Edges: For each neighbor `v` of `u`, if the path through `u` is shorter than the currently known distance to `v` (i.e., `distance(u) + weight(u, v) < distance(v)`), update the distance to `v` and add `v` to the priority queue.
Dijkstra's Algorithm Example (Source A)
Graph: A-4-B, A-2-C, B-5-C, B-10-D, C-3-D
- Dist: {A:0, B:inf, C:inf, D:inf}. PQ: [(0,A)]
- Pop (0,A). Visit neighbors B, C. Update B: dist(B) = 4. PQ: [(2,C), (4,B)] Update C: dist(C) = 2.
- Pop (2,C). Visit neighbor D. Update D: dist(D) = 2+3=5. PQ: [(4,B), (5,D)]
- Pop (4,B). Visit D. Path A->B->D (4+10=14) > 5. No update.
- Pop (5,D). End. Final Distances: {A:0, B:4, C:2, D:5}
Time Complexity: O(E log V) with a binary heap-based Priority Queue.
import heapq
def dijkstra(graph, start):
# graph is an adjacency list: {u: {v: weight, ...}}
distances = {node: float('inf') for node in graph}
distances[start] = 0
pq = [(0, start)] # (distance, node)
while pq:
dist, u = heapq.heappop(pq)
if dist > distances[u]:
continue
for neighbor, weight in graph[u].items():
if distances[u] + weight < distances[neighbor]:
distances[neighbor] = distances[u] + weight
heapq.heappush(pq, (distances[neighbor], neighbor))
return distances
Bellman-Ford Algorithm
Bellman-Ford is more robust than Dijkstra's as it can handle graphs with negative edge weights. Its ability to detect negative-weight cycles (a cycle of edges whose sum of weights is negative) is crucial in many applications, as such cycles imply that there is no shortest path (you can loop forever to get a smaller weight).
- Initialization: Set the distance to the source node as 0 and all other distances to infinity.
- Relax Edges Repeatedly: For V-1 iterations, loop through every edge (u, v) in the graph and perform the relaxation step: if `distance(u) + weight(u, v) < distance(v)`, then update `distance(v)`. This process guarantees finding the shortest path if it has at most V-1 edges.
- Check for Negative Cycles: After V-1 iterations, perform one more iteration over all edges. If any distance can still be improved, it means a negative-weight cycle is reachable from the source.
Time Complexity: O(V * E).
def bellman_ford(edges, V, start):
# edges is a list of (u, v, w) tuples
dist = {i: float('inf') for i in range(V)}
dist[start] = 0
for _ in range(V - 1):
for u, v, w in edges:
if dist[u] != float('inf') and dist[u] + w < dist[v]:
dist[v] = dist[u] + w
# Check for negative weight cycles
for u, v, w in edges:
if dist[u] != float('inf') and dist[u] + w < dist[v]:
print("Graph contains a negative weight cycle")
return None # Or handle as needed
return dist
5. Classic Problems
Number of Islands
Given a 2D grid of '1's (land) and '0's (water), count the number of islands. This is a classic graph traversal problem. Iterate through the grid. If you find a '1', it's part of an island. Increment your island count, and then use DFS or BFS starting from that cell to "sink" the entire island (e.g., change all its '1's to '0's) so you don't count it again.
class Solution:
def numIslands(self, grid: List[List[str]]) -> int:
if not grid: return 0
rows, cols = len(grid), len(grid[0])
visited = set()
islands = 0
def bfs(r, c):
q = collections.deque()
visited.add((r, c))
q.append((r, c))
while q:
row, col = q.popleft()
directions = [[1, 0], [-1, 0], [0, 1], [0, -1]]
for dr, dc in directions:
r_new, c_new = row + dr, col + dc
if (r_new in range(rows) and
c_new in range(cols) and
grid[r_new][c_new] == '1' and
(r_new, c_new) not in visited):
q.append((r_new, c_new))
visited.add((r_new, c_new))
for r in range(rows):
for c in range(cols):
if grid[r][c] == '1' and (r, c) not in visited:
bfs(r, c)
islands += 1
return islands
Course Schedule
A classic application of detecting a cycle in a directed graph, which is often solved using topological sort logic. To finish all courses, there must not be any circular dependencies.
class Solution:
def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
adj = {i: [] for i in range(numCourses)}
for crs, pre in prerequisites:
adj[crs].append(pre)
visit_set = set()
def dfs(crs):
if crs in visit_set:
return False
if adj[crs] == []:
return True
visit_set.add(crs)
for pre in adj[crs]:
if not dfs(pre): return False
visit_set.remove(crs)
adj[crs] = []
return True
for crs in range(numCourses):
if not dfs(crs): return False
return True
Interactive Graph Traversal
Visualize Breadth-First Search (BFS) and Depth-First Search (DFS) on a sample graph.
Data Structure (Stack)
Traversal Path
Select a traversal method to begin.