Greedy Algorithms

Making the Locally Optimal Choice

A greedy algorithm is an algorithmic paradigm that builds up a solution piece by piece, always choosing the next piece that offers the most obvious and immediate benefit. It makes the locally optimal choice at each stage with the hope of finding a global optimum.

1. Core Concepts

A greedy algorithm works if a problem has two key properties:

Greedy Choice Property: A globally optimal solution can be arrived at by making a locally optimal choice. This means we can make the best choice at the current moment and then solve the remaining subproblem without reconsidering previous choices.
Optimal Substructure: An optimal solution to the problem contains optimal solutions to its subproblems. This is a property shared with dynamic programming.

The main challenge with greedy algorithms is proving that the "greedy choice property" actually holds for the problem at hand. It's easy to make a choice that seems good now but prevents a better overall solution later.

Greedy vs. DP: The Knapsack Problem

A great way to understand the limits of greedy algorithms is to compare the Fractional Knapsack problem (which can be solved greedily) with the 0/1 Knapsack problem (which requires dynamic programming).

Fractional Knapsack: You can take fractions of items. The greedy strategy is to take items with the highest value-to-weight ratio first. This works because you can always fill the remaining space with a fraction of the next best item.
0/1 Knapsack: You must either take an entire item or leave it. The greedy strategy of taking the highest value-to-weight item might not be optimal, as a less "dense" item might leave more room for other valuable items.

2. Classic Problems & Strategies

Activity Selection Problem

Problem: Given n activities with their start and finish times, select the maximum number of non-overlapping activities that can be performed by a single person.

Greedy Strategy: The proven optimal strategy is to sort the activities by their finish times. Select the first activity in the sorted list. Then, iterate through the rest and select the next activity whose start time is greater than or equal to the finish time of the previously selected one.

Activity Selection Example (Sorted by Finish Time)

Activities: A(1,4), B(3,5), C(0,6), D(5,7)

1. Pick A (ends at 4)

2. Check B (starts at 3, overlaps with A). Skip.

3. Check C (starts at 0, overlaps with A). Skip.

4. Check D (starts at 5, after A finishes). Pick D.

Result: [A, D]

def max_activities(start_times, finish_times):
    # Pair activities with their start and finish times
    activities = sorted(zip(start_times, finish_times), key=lambda x: x[1])
    
    n = len(activities)
    if n == 0:
        return 0
    
    # The first activity is always selected
    count = 1
    last_finish_time = activities[0][1]
    
    for i in range(1, n):
        # If this activity has a start time greater than or equal to the
        # finish time of previously selected activity, then select it
        if activities[i][0] >= last_finish_time:
            count += 1
            last_finish_time = activities[i][1]
            
    return count

Merge Intervals

Problem: Given a collection of intervals, merge all overlapping intervals.

Greedy Strategy: First, sort the intervals based on their start times. Then, iterate through the sorted intervals. If the current interval overlaps with the last interval in your merged list, update the end of the last merged interval. Otherwise, add the current interval to the list.

from typing import List

class Solution:
    def merge(self, intervals: List[List[int]]) -> List[List[int]]:
        if not intervals:
            return []
        
        intervals.sort(key=lambda x: x[0])
        
        merged = [intervals[0]]
        for i in range(1, len(intervals)):
            last_merged = merged[-1]
            current_interval = intervals[i]
            
            if current_interval[0] <= last_merged[1]:
                last_merged[1] = max(last_merged[1], current_interval[1])
            else:
                merged.append(current_interval)
                
        return merged

Jump Game

Problem: You are given an integer array nums. You are initially positioned at the first index, and each element in the array represents your maximum jump length at that position. Return true if you can reach the last index, or false otherwise.

Greedy Strategy: The goal is to find if the last index is reachable. We can iterate through the array, keeping track of the maximum index we can possibly reach so far (let's call it goal). We start with the goal at the last index. Then, we iterate backward from the second-to-last index. At each position i, if we can jump from i to reach or surpass the current goal (i.e., i + nums[i] >= goal), it means this position i is a new, closer "good" starting point. So, we update our goal to i. If we can reach index 0 at the end, the answer is true.

from typing import List

class Solution:
    def canJump(self, nums: List[int]) -> bool:
        goal = len(nums) - 1
        for i in range(len(nums) - 2, -1, -1):
            if i + nums[i] >= goal:
                goal = i
        return goal == 0

Gas Station

Problem: There are n gas stations on a circular route, where the amount of gas at the ith station is gas[i]. You have a car with an unlimited gas tank and it costs cost[i] to travel from the ith station to the next. Find the starting gas station's index if you can travel around the circuit once, otherwise return -1.

Greedy Strategy: A key insight is that if the total amount of gas is less than the total cost, a solution is impossible. If a solution does exist, it is unique. We can find the starting point in a single pass. We maintain a running total of the tank's gas. If at any point the tank goes negative, it means we could not have started at any of the previous stations. So, we reset our starting point to the next station and reset the tank to zero. The total gas/cost check ensures that if we finish the loop, the starting point we found is valid.

from typing import List

class Solution:
    def canCompleteCircuit(self, gas: List[int], cost: List[int]) -> int:
        if sum(gas) < sum(cost):
            return -1
            
        total = 0
        start_station = 0
        for i in range(len(gas)):
            total += (gas[i] - cost[i])
            
            if total < 0:
                total = 0
                start_station = i + 1
        
        return start_station