Graphs

Modeling a World of Connections

Graphs are the ultimate data structure for modeling networks and relationships. A graph consists of a set of vertices (or nodes) and a set of edges that connect pairs of vertices. This module covers everything from basic representation and traversal to the most essential graph algorithms.

1. Graph Representation

How you store a graph in memory is crucial for an algorithm's efficiency. The two most common ways are:

Adjacency List

This is the most common representation. It's an array of lists, where the list at index i contains all the vertices adjacent to vertex i. For weighted graphs, this list can store pairs of (neighbor, weight).

Pros: Space-efficient for sparse graphs (where |E| is much smaller than |V|²). Iterating over neighbors is fast.

Cons: Checking if an edge exists between two vertices takes O(k) time, where k is the number of neighbors.


# Adjacency list using a dictionary
adj_list = {
    0: [1, 2],
    1: [0, 3],
    2: [0],
    3: [1]
}
Adjacency Matrix

A V x V matrix where matrix[i][j] is 1 (or the edge weight) if there's an edge from vertex i to j, and 0 otherwise.

Pros: Checking for a specific edge is O(1).

Cons: Requires O(V²) space, which is inefficient for sparse graphs.


# Adjacency matrix using a list of lists
V = 4
adj_matrix = [[0] * V for _ in range(V)]

# For an edge between 0 and 1
adj_matrix[0][1] = 1
adj_matrix[1][0] = 1

2. Graph Traversal

Traversal means visiting every vertex and edge exactly once. Use a visited set or array to avoid getting stuck in cycles.

Depth-First Search (DFS)

DFS explores as far as possible down each branch before backtracking. It's like navigating a maze by always turning right at every junction until you hit a dead end, then backtracking. It typically uses a Stack (often implicitly via recursion).


from collections import defaultdict

# Graph represented as an adjacency list
graph = defaultdict(list)

def add_edge(u, v):
    graph[u].append(v)

def dfs_util(v, visited):
    visited.add(v)
    print(v, end=' ')

    for neighbour in graph[v]:
        if neighbour not in visited:
            dfs_util(neighbour, visited)

def dfs(start_node):
    visited = set()
    dfs_util(start_node, visited)

# Example usage:
# add_edge(0, 1)
# add_edge(0, 2)
# ...
# dfs(0)
Breadth-First Search (BFS)

BFS explores the graph layer by layer from a starting vertex. It's like the expanding ripples from a stone dropped in water. It uses a Queue. A key property of BFS is that it finds the shortest path in terms of the number of edges in an unweighted graph.


from collections import defaultdict, deque

# Graph represented as an adjacency list
graph = defaultdict(list)

def add_edge(u, v):
    graph[u].append(v)

def bfs(start_node):
    visited = {start_node}
    queue = deque([start_node])
    
    while queue:
        node = queue.popleft()
        print(node, end=' ')
        
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)

3. Minimum Spanning Tree (MST)

A Minimum Spanning Tree (MST) is a subset of the edges of a connected, edge-weighted, undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. Imagine you need to connect several houses with water pipes; the MST finds the cheapest way to lay the pipes to connect all houses.

Example Graph and its MST

Original Graph (Edges with weights)

A weighted, undirected graph showing connections between nodes A through I.

Minimum Spanning Tree (Total Weight = 37)

The Minimum Spanning Tree derived from the original graph, highlighting the edges that connect all nodes with the minimum total weight.
Kruskal's Algorithm

Kruskal's algorithm is a greedy approach to finding an MST. It treats the graph as a collection of individual nodes and progressively merges them by adding the "safest" edge, which is the one with the lowest weight that doesn't form a cycle.

  1. Sort Edges: Create a list of all edges and sort them by weight in non-decreasing order.
  2. Iterate and Union: Iterate through the sorted edges. For each edge (u, v), check if adding it will form a cycle. This is done efficiently using a Disjoint Set Union (DSU) data structure. If vertices u and v are not already in the same component (find(u) != find(v)), add the edge to the MST and perform a union(u, v) operation.
  3. Stop: Continue until the MST has V-1 edges (where V is the number of vertices).

Time Complexity: O(E log E) or O(E log V), dominated by sorting the edges.


# Assume DSU class from previous module exists
def kruskals_mst(edges, V):
    # edges is a list of tuples: (u, v, weight)
    edges.sort(key=lambda item: item[2]) # Sort by weight
    dsu = DSU(V)
    mst_weight = 0
    mst_edges = []
    
    for u, v, weight in edges:
        if dsu.union(u, v):
            mst_weight += weight
            mst_edges.append((u, v, weight))
            
    return mst_weight, mst_edges
Prim's Algorithm

Prim's algorithm is another greedy approach that grows the MST from a single arbitrary starting vertex. It continuously adds the cheapest edge that connects a vertex inside the growing MST to a vertex outside of it.

  1. Initialization: Start with an empty MST. Pick a starting vertex and add all its edges to a Priority Queue. Mark the starting vertex as visited.
  2. Grow the Tree: While the priority queue is not empty, extract the edge with the minimum weight. If the edge connects to an unvisited vertex, add the edge to the MST, mark the new vertex as visited, and add all of its outgoing edges to the priority queue.

Time Complexity: O(E log V) using a binary heap-based Priority Queue.


import heapq

def prims_mst(graph, start_node):
    # graph is an adjacency list: {u: {v: weight, ...}}
    mst_weight = 0
    visited = {start_node}
    # (weight, from_node, to_node)
    min_heap = [(weight, start_node, to_node) for to_node, weight in graph[start_node].items()]
    heapq.heapify(min_heap)
    
    while min_heap and len(visited) < len(graph):
        weight, u, v = heapq.heappop(min_heap)
        if v not in visited:
            visited.add(v)
            mst_weight += weight
            for neighbor, neighbor_weight in graph[v].items():
                if neighbor not in visited:
                    heapq.heappush(min_heap, (neighbor_weight, v, neighbor))
    return mst_weight

4. Shortest Path Algorithms

Dijkstra's Algorithm

Dijkstra's finds the shortest path from a single source vertex to all other vertices in a weighted graph with non-negative edge weights. It works by maintaining a set of visited vertices and, for every unvisited vertex, the smallest known distance from the source. It's a greedy algorithm that always explores the "closest" unvisited vertex.

  1. Initialization: Set the distance to the source node as 0 and all other distances to infinity. Add the source node to a priority queue, prioritized by distance.
  2. Process Nodes: While the priority queue is not empty, extract the node `u` with the smallest distance.
  3. Relax Edges: For each neighbor `v` of `u`, if the path through `u` is shorter than the currently known distance to `v` (i.e., `distance(u) + weight(u, v) < distance(v)`), update the distance to `v` and add `v` to the priority queue.

Dijkstra's Algorithm Example (Source A)

Graph: A-4-B, A-2-C, B-5-C, B-10-D, C-3-D
  1. Dist: {A:0, B:inf, C:inf, D:inf}. PQ: [(0,A)]
  2. Pop (0,A). Visit neighbors B, C. Update B: dist(B) = 4. PQ: [(2,C), (4,B)] Update C: dist(C) = 2.
  3. Pop (2,C). Visit neighbor D. Update D: dist(D) = 2+3=5. PQ: [(4,B), (5,D)]
  4. Pop (4,B). Visit D. Path A->B->D (4+10=14) > 5. No update.
  5. Pop (5,D). End. Final Distances: {A:0, B:4, C:2, D:5}

Time Complexity: O(E log V) with a binary heap-based Priority Queue.


import heapq

def dijkstra(graph, start):
    # graph is an adjacency list: {u: {v: weight, ...}}
    distances = {node: float('inf') for node in graph}
    distances[start] = 0
    pq = [(0, start)] # (distance, node)
    
    while pq:
        dist, u = heapq.heappop(pq)
        
        if dist > distances[u]:
            continue
            
        for neighbor, weight in graph[u].items():
            if distances[u] + weight < distances[neighbor]:
                distances[neighbor] = distances[u] + weight
                heapq.heappush(pq, (distances[neighbor], neighbor))
                
    return distances
Bellman-Ford Algorithm

Bellman-Ford is more robust than Dijkstra's as it can handle graphs with negative edge weights. Its ability to detect negative-weight cycles (a cycle of edges whose sum of weights is negative) is crucial in many applications, as such cycles imply that there is no shortest path (you can loop forever to get a smaller weight).

  1. Initialization: Set the distance to the source node as 0 and all other distances to infinity.
  2. Relax Edges Repeatedly: For V-1 iterations, loop through every edge (u, v) in the graph and perform the relaxation step: if `distance(u) + weight(u, v) < distance(v)`, then update `distance(v)`. This process guarantees finding the shortest path if it has at most V-1 edges.
  3. Check for Negative Cycles: After V-1 iterations, perform one more iteration over all edges. If any distance can still be improved, it means a negative-weight cycle is reachable from the source.

Time Complexity: O(V * E).


def bellman_ford(edges, V, start):
    # edges is a list of (u, v, w) tuples
    dist = {i: float('inf') for i in range(V)}
    dist[start] = 0
    
    for _ in range(V - 1):
        for u, v, w in edges:
            if dist[u] != float('inf') and dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                
    # Check for negative weight cycles
    for u, v, w in edges:
        if dist[u] != float('inf') and dist[u] + w < dist[v]:
            print("Graph contains a negative weight cycle")
            return None # Or handle as needed
            
    return dist

5. Classic Problems

Number of Islands

Given a 2D grid of '1's (land) and '0's (water), count the number of islands. This is a classic graph traversal problem. Iterate through the grid. If you find a '1', it's part of an island. Increment your island count, and then use DFS or BFS starting from that cell to "sink" the entire island (e.g., change all its '1's to '0's) so you don't count it again.


class Solution:
    def numIslands(self, grid: List[List[str]]) -> int:
        if not grid: return 0
        rows, cols = len(grid), len(grid[0])
        visited = set()
        islands = 0

        def bfs(r, c):
            q = collections.deque()
            visited.add((r, c))
            q.append((r, c))
            while q:
                row, col = q.popleft()
                directions = [[1, 0], [-1, 0], [0, 1], [0, -1]]
                for dr, dc in directions:
                    r_new, c_new = row + dr, col + dc
                    if (r_new in range(rows) and
                        c_new in range(cols) and
                        grid[r_new][c_new] == '1' and
                        (r_new, c_new) not in visited):
                        q.append((r_new, c_new))
                        visited.add((r_new, c_new))

        for r in range(rows):
            for c in range(cols):
                if grid[r][c] == '1' and (r, c) not in visited:
                    bfs(r, c)
                    islands += 1
        return islands
Course Schedule

A classic application of detecting a cycle in a directed graph, which is often solved using topological sort logic. To finish all courses, there must not be any circular dependencies.


class Solution:
    def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
        adj = {i: [] for i in range(numCourses)}
        for crs, pre in prerequisites:
            adj[crs].append(pre)
            
        visit_set = set()
        
        def dfs(crs):
            if crs in visit_set:
                return False
            if adj[crs] == []:
                return True
            
            visit_set.add(crs)
            for pre in adj[crs]:
                if not dfs(pre): return False
            visit_set.remove(crs)
            adj[crs] = []
            return True

        for crs in range(numCourses):
            if not dfs(crs): return False
        return True

Interactive Graph Traversal

Visualize Breadth-First Search (BFS) and Depth-First Search (DFS) on a sample graph.

A
B
C
D
E
F
G

Data Structure (Stack)

Traversal Path

Select a traversal method to begin.