Array

Kadane’s Algorithm

how to find the maximum sum of a contiguous subarray in an array of integers.

# Brute Force
def max_subarray_sum(arr):
    max_sum = float('-inf')
    for i in range(len(arr)):
        for j in range(i, len(arr)):
            current_sum = sum(arr[i:j+1])
            max_sum = max(max_sum, current_sum)
    return max_sum

The complexity of this algorithm is O(n^3) because we have two nested loops and the sum function takes O(n) time.

# sliding window O(n)
def maxSubArray(self, nums: List[int]) -> int:
    left = 0
    right = 0
    n = len(nums)
    cur = nums[left]
    m_sum = cur
    while True:
        if cur < 0:
            left = right + 1
            right = left
            cur = 0
            if left >= n:
                break
            cur = nums[left]
        else:
            right += 1
            if right >= n:
                break
            cur += nums[right]
        if cur > m_sum:
            m_sum = cur
    return m_sum

But it is not very elegant.

# Kadane's Algorithm O(n)
def kadanes(nums):
  max_sum = nums[0]
  cur_sum = 0
  for n in nums:
    cur_sum = max(0, cur_sum)
    cur_sum += n
    max_sum = max(max_sum, cur_sum)
  return max_sum

Kadane’s algorithm has a big overlape with the sliding window. but if we want to get he left and right pointers for the maximum subarray, we need to use the sliding window one to track the left and right pointers.

Sliding Window

You are given an array of integers nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position. Get the max element for each window. leetcode 239

def maxSlidingWindow(nums: List[int], k: int) -> List[int]:
    max_sw = []
    left, right = 0, 0
    n = len(nums) 
    while right < n:
        e_r = nums[right]
        if e_r in win:
            win[e_r] += 1
        else:
            win[e_r] = 1
        if right - left + 1 == k:
            max_sw.append(max(win))
            e_l = nums[left]
            if win[e_l] > 1:
                win[e_l] -= 1
            else:
                del win[e_l]
            left += 1
        right += 1
    return max_sw

But this one will have O(nk), if n and k are both big, we will have time limit issue.

dq = deque()   # stores indices
result = []
for i in range(len(nums)):
    # 1️⃣ Remove indices out of window
    if dq and dq[0] == i - k:
        dq.popleft()
    # 2️⃣ Maintain decreasing order
    while dq and nums[dq[-1]] < nums[i]:
        dq.pop()
    # 3️⃣ Add current index
    dq.append(i)
    # 4️⃣ Window formed
    if i >= k - 1:
        result.append(nums[dq[0]])
return result

We put element in order in the deque and when a new element comes in, we remove all smaller elements before it. O(n), we remove max n element, so keep order do not effect the complexity.

Two pointers

sliding window is a subset of two pointers and in two points, we concentrate more on the two elements on each side. There are questions like Check if an array is a palindrome, two sum, target sum.

Q1

Given a sorted of integers, return the indixes of two elements (in different positions) that sum up to the target value. Assume there is exactly one solution.

def targerSum(nums, target):
  L,R = 0, len(nums) - 1
  while L < R:
    if nums[L] + nums[R] > target:
      R -= 1
    elif nums[L] + nums[R] < target:
      L += 1
    else:
      return [L, R]

Prefix Sums

Prefix: sublist that start at the beginning. prefix sum and prefix product can be presented by the recursive pattern. Postfix is like the prefix and they are two sides of the same coin.

Prefix sum can used to optimise the calculation. For example to calculate the sum of an subarray. we can

class prefix_sum:
    def __init__(self, nums):
        self.prefix = []
        total = 0
        for n in nums:
            total += n
            self.prefix.append(total)
    
    def subsum(self, left, right):
        pre_right = self.prefix[right]
        pre_left = self.prefix[left-1] if left > 0 else 0
        return pre_right - pre_left

Q1

Given an integer array nums, return an array answer such that answer[i] is equal to the product of all the elements of nums except nums[i].

The product of any prefix or suffix of nums is guaranteed to fit in a 32-bit integer.

You must write an algorithm that runs in O(n) time and without using the division operation. Leetcode 238

def productExceptSelf(self, nums: List[int]) -> List[int]:
    n = len(nums)
    prefix, suffix = [1], [1]
    result = []
    for i in range(n-1):
        prefix.append(prefix[-1] * nums[i])
        suffix.append(suffix[-1] * nums[n-1-i])
    for i in range(n):
        result.append(prefix[i] * suffix[n-i-1])
    return result

# Or we can allocate the memory first
def productExceptSelf(self, nums: List[int]) -> List[int]:
    result = [1] * len(nums)
    right, left = 1, 1
    for i in range(len(nums)):
        result[i] *= left
        left *= nums[i]
    for i in range(len(nums)-1, -1, -1):
        result[i] *= right
        right *= nums[i]
    return result

Linked List

Fast and Slow Pointers

It is also called Floyd’s tortoise and hare algorithm. We can use it to find the middle of a linked list, to detect if there is a cycle in a linked list and to find the entry point of the cycle.

Q1

Find the middle of a linked list. If there are two middle nodes, return the second middle node.

def middle_of_list(head):
    # odd and even case are both covered by this algorithm, when there are two middle nodes, the slow pointer will jump over the first middle node and stop at the second middle node.
    slow = head
    fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
    return slow

Q2

Given head, the head of a linked list, determine if the linked list has a cycle in it. The basic method is to use a set to store the visited nodes pointer hash, if we visit a node that is already in the set, then there is a cycle. But this method has O(n) space complexity. We can use fast and slow pointers to achieve O(1) space complexity.

The idea is that if there is a cycle, the fast pointer will eventually catch up with the slow pointer. If there is no cycle, the fast pointer will reach the end of the list.

Will they meet when there is a cycle? Yes, they will meet. The fast pointer moves twice as fast as the slow pointer, so it will eventually catch up with the slow pointer. And they can be proved by the mathematical induction. If they start att the same position $$ (1 + 2\times t) \mod n == (1 + t) \mod n $$ These two quation represent the position of the fast pointer and the slow pointer after t steps. When they meet, they will be at the same position, so we can solve this equation to find the value of t. t is the times of step and n is the length of the cycle. We can find the they will meet at the t is the multiple of n, which means they will meet at the same position after n steps.

def has_cycle(head):
    slow = head
    fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
        if slow == fast:
            return True
    return False

Q3

Given head, the head of a linked list, return the node where the cycle begins. If there is no cycle, return null.

flowchart LR
    1(("1:Head")) --> 2
    2 --> 3["3:entry point"]
    3 --> 4
    4 --> 5(("5:meet point"))
    5 --> 6
    6 --> 7
    7 --> 3

    1 -. p .-> 3
    3 -. n-k .-> 5
    5 -. k .-> 3

From head to entry point, we have p steps. The circle length is n. From entry point to meet point, we have k steps. So from meet point to entry point again, we have n-k steps. When they meet, the fast pointer has moved 2(p+k) steps, and the slow pointer has moved p+k steps. So when faster pointer and slow pointer meet we have: $$ 2(p+n-k) = p + n - k + n*m $$ Where m is the times of cycle. we ahve $$ p - k = n(m-1) $$

if m is 1, we have p = k, which means the entry point is the meet point.

so we can then put another pointer at the head of the list, and move it and the slow pointer one step at a time, they will meet at the entry point of the cycle.

def find_cycle_entry(head):
    slow = head
    fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
        if slow == fast:
            break
    if not fast or not fast.next:
        return None
    entry = head
    while entry != slow:
        entry = entry.next
        slow = slow.next
    return entry

Trie (Prefix Tree)

The complexity of searching word or inserting word are both O(1). This can be achieved by using a hash map. But for Prefix Tree, we can also search with prefix with O(1) time complexity.

This algorithm is very useful for autocomplete and spell checker. It can also be used to solve some problems like word search, word search II, etc.

Basic a trie is a tree of characters, each node represents a character and the path from the root to a node represents a word. We can use a hash map to store the children of each node. For English, each node will have at most 26 children.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word):
        curr = self.root
        for c in word:
            if c not in curr.children:
                curr.children[c] = TrieNode()
            curr = curr.children[c]
        curr.word = True

    def search(self, word):
        curr = self.root
        for c in word:
            if c not in curr.children:
                return False
            curr = curr.children[c]
        return curr.word
    
    def startsWith(self, prefix):
        curr = self.root
        for c in prefix:
            if c not in curr.children:
                return False
            curr = curr.children[c]
        return True

Union-Find

Union-Find also called disjoint sets. It is a good algorithm to group the linked nodes in a undirected graph, especially when the graph is large and sparse. For directed graph, this algorithm is hard to use as the algorithm only considers whether there is link between nodes or not.

DFS can normally accomplish the same task, but Union-Find can be more efficient. We often use DFS to do the graph traversal and use Union-Find to find the connected components in a graph or detect cycles.

Complexity analysis: Let’s say we have n nodes and m edges, to find the connected components, the time complexity of DFS is O(n+m), as we need to visit each node and edge, while the time complexity of Union-Find is O(m*α(n)), where α(n) is the inverse Ackermann function, which is very slow growing and can be considered as a constant for all practical purposes. So Union-Find can be more efficient than DFS when the graph is large and sparse.

The idea is to maintain a parent array, where parent[i] is the parent of node i. Initially, each node is its own parent, so we have parent[i] = i for all i. When we union two nodes, we set the parent of one node to be the other node. To find the root of a node, we can follow the parent pointers until we reach a node that is its own parent.

class UnionFind:
    def __init__(self, n):
		# each node is its own parent initially
        self.parent = [i for i in range(n)]
		# weight[i] is the size of the tree rooted at i
		self.weight = [1] * n

	def find(self, x):
		if self.parent[x] != x:
			# path compression
			self.parent[x] = self.find(self.parent[x])
		return self.parent[x]

	def union(self, x, y):
		rootX = self.find(x)
		rootY = self.find(y)
        if rootX == rootY:
            return False
		if rootX != rootY:
			# attach the small tree to the bigger one
			if self.weight[rootX] > self.weight[rootY]:
				self.parent[rootY] = rootX
				self.weight[rootX] += self.weight[rootY]
			else:
				self.parent[rootX] = rootY
				self.weight[rootY] += self.weight[rootX]
        return True

	def build(self, edges):
		"""Build the union-find structure from the edges of the graph.
		The complexity of this function is O(m*α(n)), 
		where m is the number of edges and n is the number of nodes.
		α(n) present the time complexity of find function, 
		which is very slow growing and
		can be considered as a constant for all practical purposes with path compression
		and union by weight.
		So the overall complexity of this function can be considered as O(m).
		"""
		for x, y in edges:
			self.union(x, y)

	def connected(self, x, y):
		"""Check if two nodes are connected.
		"""
		return self.find(x) == self.find(y)

	def count_components(self):
		"""Count the number of connected components in the graph.
		"""
		roots = set()
		for i in range(len(self.parent)):
			roots.add(self.find(i))
		return len(roots)
	
	def cycle_detection(self, edges):
		"""Detect if there is a cycle in the graph. If there is a cycle, return True, otherwise return False.
		"""
		for x, y in edges:
			if self.connected(x, y):
				return True
			self.union(x, y)
		return False

Segment Tree

Segement tree is a binary tree data structure that allows us to efficiently query and update the values of a range of elements in an array. It is often used in problems that require frequent updates and queries on an array, such as range sum queries, range minimum queries, greatest common divisor queries, and any other associative operation queries.

The complexity of update and query operations in a segment tree is O(log n), where n is the number of elements in the array. The space complexity of a segment tree is O(4n) in the worst case, which can be considered as O(n) for practical purposes.

If we use an array to do update and query, the time complexity of both operations is O(n), which can be inefficient for large arrays. Segment tree can significantly reduce the time complexity to O(log n) for both operations. Even we ca use prefix sum to do range sum query, the time complexity of query is O(1), but the time complexity of update is O(n), which can be inefficient for large arrays.

The Idea of segment tree is to build a binary tree where each node represents a segment of the array. The leaf nodes represent individual elements of the array, and the internal nodes represent the result of applying the associative operation (e.g., sum, min, max) on the segments represented by their children.

class SegmentTree:
    """ simple segment tree implementation for sum operation"""
    def __init__(self, total, l_b, r_b):
        """
        Args
            total: total sum
            l_b: left bound
            r_b: right bound
        """
        self.sum = total
        self.left = None
        self.right = None
        self.l_b = l_b
        self.r_b = r_b
    
    @staticmethod
    def build(nums, l_b, r_b):
        """ build the segment tree with O(n) with n is 
            the number of nodes and n is the number of leafs
        """
        if l_b == r_b:
            return SegmentTree(nums[l_b], l_b, r_b)
        m = (l_b + r_b) // 2
        root = SegmentTree(0, l_b, r_b)
        root.left = SegmentTree.build(nums, l_b , m)
        root.right = SegmentTree.build(nums, m + 1, r_b)
        root.sum = root.left.sum + root.right.sum
        return root

    def update(self, index, val):
        if self.l_b == self.r_b:
            self.sum = val
            return
        m = (self.l_b + self.r_b) // 2
        if index > m:
            self.right.update(index, val)
        else:
            self.left.update(index, val)
        self.sum = self.left.sum + self.right.sum

    # The time complexity of this function is O(log n) where n is the number of leaf nodes in the segment tree, which is the same as the number of elements in the original array. This is because in each step of the recursion, we are halving the range of the query, and we will have at most log n levels of recursion until we reach the leaf nodes.
    def range_query(self, l_b, r_b):
        if l_b == self.l_b and r_b == self.r_b:
            return self.sum
        m = (self.l_b + self.r_b) // 2
        if l_b > m:
            return self.right.range_query(l_b, r_b)
        elif r_b <= m:
            return self.left.range_query(l_b, r_b)
        else:
            return (self.left.range_query(l_b, m) +
                    self.right.range_query(m + 1, r_b))

Iterative DFS

Iterative DFS is a way to perform depth-first search on a graph or a tree without using recursion. It is like simulating the call stack of the recusive DFS with an explicit stack data structure.

Traversal

class TreeNode:
    def __init__(self, val, left, right):
        self.val = val
        self.left = left
        self.right = right

def inorder(root):
    # time and space o(n), space complexity is due to the inbalance case.
    stack = []
    cur = root
    while cur or stack:
        if cur:
            stack.append(cur)
            cur = cur.left
        else:
            cur = stack.pop()
            print(cur.val)
            cur = cur.right

def preorder(root):
    stack = []
    cur = root
    while cur or stack:
        if cur:
            print(cur.val)
            if cur.right:
                stack.append(cur.right)
            cur = cur.left
        else:
            cur = stack.pop()

def postorder(root):
    # The key idea is that the we do not print the first visit 
    # only print the parent when ths fist and second children 
    # have been passed
    stack = [root]
    visit = [False]
    while stack:
        cur, visited = stack.pop(), visit.pop()
        if cur:
            if visited:
                print(cur.val)
            else:
                stack.append(cur)
                visit.append(True)
                stack.append(cur.right)
                visit.append(False)
                stack.append(cur.left)
                visit.append(False)

Two Heaps

Two Heaps is used to do the task like finding the median of a stream of numbers. Two heaps is a data structure that consists of two heaps, one is a max heap and the other is a min heap. The max heap is used to store the smaller half of the numbers, and the min heap is used to store the larger half of the numbers. The idea is to maintain the balance between the two heaps, so that the difference in size between the two heaps is at most 1. This way, we can easily find the median of the numbers in O(1) time.

import heapq

class TwoHeapMedianFinder:
    def __init__(self):
        self.small = []  # max heap
        self.large = []  # min heap

    def insert(self, num: int) -> None:
        """ Heap insertion """
        # push first to the small max heap and push to large min heap if needed
        heapq.heappush(self.small, -num)
        if self.small and self.large and (-self.small[0] > self.large[0]):
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)

        # Uneven size handle
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        if len(self.large) > len(self.small) + 1:
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)

    def get_median(self):
        """ Heap get minimum """
        l_s, l_l = len(self.small), len(self.large)
        if l_s > l_l:
            return -self.small[0]
        elif l_s < l_l:
            return self.large[0]
        else:
            return (-self.small[0] + self.large[0]) // 2

Backtracking

Backtracking is a general algorithm for finding all (or some) solutions to some computational problems, notably constraint satisfaction problems. It incrementally builds candidates to the solutions, and abandons a candidate (“backtracks”) as soon as it determines that the candidate cannot possibly be completed to a valid solution.

Subsets

Q1 Subsets

Given an integer array nums of unique elements, return all possible subsets (the power set). The solution set must not contain duplicate subsets. The subsets can be returned in any order. Leetcode 78

def subsets_without_dup(nums):
    cur_set, subsets = [], []
    helper(0, nums, cur_set, subsets)
    return subsets

def helper(i, nums, cur_set, subsets):
    if i >= len(nums):
        subsets.append(cur_set.copy())
        return
    # decision to include nums[i]
    cur_set.append(nums[i])
    helper(i + 1, nums, cur_set, subsets)

    # this step is the kernel of backtracking
    cur_set.pop()

    # decision to not include nums[i]
    helper(i + 1, nums, cur_set, subsets)

Q2 Distinct Subsets

Given a list of nums that are not necessarily distinct, return all distinct subsets (the power set). The solution set must not contain duplicate subsets. The subsets can be returned in any order. Leetcode 90

The complexity of this algorithm is O(2^n) in the worst case, where n is the number of elements in the input array. This is because in the worst case, all elements are distinct and we have to generate all possible subsets, which is 2^n. However, if there are duplicates in the input array, the number of distinct subsets will be less than 2^n, and the actual time complexity will depend on the number of distinct subsets generated. The sorting step takes O(n log n) time, but it is dominated by the O(2^n) time complexity of generating subsets in the worst case.

def subsets_without_dup(nums):
    nums.sort()  # sort the array to handle duplicates
    cur_set, subsets = [], []
    helper(0, nums, cur_set, subsets)
    return subsets

def helper(i, nums, cur_set, subsets):
    if i >= len(nums):
        subsets.append(cur_set.copy())
        return
    # decision to include nums[i]
    cur_set.append(nums[i])
    helper(i + 1, nums, cur_set, subsets)
    # this step is the kernel of backtracking
    cur_set.pop()

    # decision to not include nums[i]
    while i + 1 < len(nums) and nums[i] == nums[i + 1]:
        i += 1
    helper(i + 1, nums, cur_set, subsets)

Combinations

Given two nums n and k, return all possible combinations of k numbers out of 1 … n. Leetcode 77, the combinnation is like subsets but the length of the subset is fixed to k and the order of the numbers in the subset does not matter. we can use backtracking to solve this problem as well.

The complexity of this algorithm is O(k * C(n, k)), where C(n, k) is the number of combinations of n items taken k at a time. This is because we are generating all possible combinations of k numbers from 1 to n, and each combination takes O(k) time to generate.

def combination(n, k):
    cur_set, subsets = [], []
    helper(0, n, k, cur_set, subsets)
    return subsets

def helper(i, n, k, cur_set, subsets):
    if len(cur_set) == k:
        subsets.append(cur_set.copy())
        return
    if i >= n:
        return
    cur_set.append(i+1) # the number
    helper(i + 1, n, k, cur_set, subsets)
    cur_set.pop()
    helper(i + 1, n, k, cur_set, subsets)

# Combinations II: Use the combination logic, we start with n choice and then we have n-1 choice and then n-2 choice, until we have k choice. So we can use the same backtracking logic as combination, but we need to add a condition to skip the numbers that are already in the current set. This way we can avoid generating duplicate combinations.
def combination2(n, k):
    cur_set, subsets = [], []
    helper2(0, n, k, cur_set, subsets)
    return subsets

def helper2(i, n, k, cur_set, subsets):
    if len(cur_set) == k:
        subsets.append(cur_set.copy())
        return
    if i >= n:
        return

    for j in range(i + 1, n + 1):
        cur_set.append(j)
        helper2(j, n, k, cur_set, subsets)
        cur_set.pop()

Permutations

Give a list of distinct nums, return all possible permutations of the list. Leetcode 46 The complexity of this algorithm is O(n * n!), where n is the number of elements in the input array. This is because there are n! permutations of n elements, and each permutation takes O(n) time to generate (since we need to copy the current permutation to the result list).

class Solution:
    def permute(self, nums: List[int]) -> List[List[int]]:
        perms = []
        n = len(nums)
        self.helper(nums, n, [], perms)
        return perms

    def helper(self, nums, n, cur_perm, perms):
        if len(cur_perm) == n:
            perms.append(cur_perm)
            return 
        for j, num in enumerate(nums):
            new_perm = cur_perm + [num]
            nums_rest = nums[:j] + nums[j + 1:]
            self.helper(nums_rest, n, new_perm, perms)

Graph

Dijkstra’s Algorithm

Dijkstra’s algorithm is a graph algorithm that finds the shortest path from a source vertex to all other vertices in a weighted graph. It is a greedy algorithm that repeatedly selects the vertex with the smallest distance from the source and updates the distances of its neighbors.

BFS can be used to find the shortest path in an unweighted graph, but it cannot be used to find the shortest path in a weighted graph. Dijkstra’s algorithm can be used to find the shortest path in a weighted graph, but it cannot be used to find the shortest path in a graph with negative weight edges. For graphs with negative weight edges, we can use Bellman-Ford algorithm.

Dijkstra’s algorithm can be implemented using a priority queue (min heap) to efficiently get the vertex with the smallest distance. The algorithm starts by initializing the distance to the source vertex as 0 and the distance to all other vertices as infinity. Then it repeatedly selects the vertex with the smallest distance, updates the distances of its neighbors, and adds the neighbors to the priority queue if their distances are updated. The idea is that we prioritize the vertices with the smallest distance from the source vertex, so that we can efficiently find the shortest path to all other vertices in the graph.

The complexity of Dijkstra’s algorithm is O((V + E) log V), where V is the number of vertices and E is the number of edges in the graph. This is because we need to visit each vertex and edge at most once, and we use a priority queue to keep track of the vertices with the smallest distance, which takes O(log V) time for each insertion and extraction.

Given a connected graph represented by a list of edges, where each edge is represented as a tuple (u, v, w) indicating an edge from vertex u to vertex v with weight w, and a source vertex s, return the shortest distance from the source vertex s to all other vertices in the graph. If a vertex is unreachable from the source vertex, return -1 for that vertex. Leetcode 743

import heapq

def shortest_path(edges, n, src):
    adj = {}
    for i in range(1, n + 1):
        adj[i] = []
    for s, d, w in edges:
        # source, destinatio, weights
        adj[s].append((d, w))
    shortest = {}
    # a head of (weigh, destination)
    min_heap = [(0, src)]
    while min_heap:
        w1, n1 = heapq.heappop(min_heap)
        if n1 in shortest:
            continue
        shortest[n1] = w1
        for n2, w2 in adj[n1]:
            if n2 not in shortest:
                # distance from src update
                heapq.heappush(min_heap, (w1 + w2, n2))
    return shortest

Prim’s Algorithm

Prim’s algorithm is a greedy algorithm that finds the minimum spanning tree of a weighted undirected connected graph. It starts with an arbitrary vertex and grows the spanning tree by repeatedly adding the edge with the smallest weight that connects a vertex in the tree to a vertex outside the tree.

So for the spanning tree, the tree is acyclic and it connects all the vertices in the graph with the minimum total edge weight. Prim’s algorithm is efficient for dense graphs, while Kruskal’s algorithm is more efficient for sparse graphs. The complexity of Prim’s algorithm is O(E log V), where E is the number of edges and V is the number of vertices in the graph. This is because we need to visit each edge at most once, and we use a priority queue to keep track of the edges with the smallest weight, which takes O(log V) time for each insertion and extraction.

It does matter where we start the spanning tree, but the total weight of the minimum spanning tree will be the same regardless of the starting vertex.

import heapq

def min_spanning_tree(edges, n):
    # construct the adjacent dictionary
    adj = {}
    for i in range(1, n + 1):
        adj[i] = []
    for src, dst, weight in edges:
        adj[src].append([dst, weight])
        adj[dst].append([src, weight])

    # Initialise the heap by chosing a single node
    # pushing all its neighbours
    min_heap = []
    for neighbor, weight in adj[1]:
        heapq.heappush(min_heap, [weight, 1, neighbor])
    mst = [] # min spinning tree result
    visited = set()
    visited.add(1)

    while min_heap:
        weight, src, node = heapq.heappop(min_heap)
        if node in visited:
            continue
        mst.append([src, node])
        visited.add(node)
        for neighbor, weight in adj[node]:
            if neighbor not in visited:
                heapq.heappush(min_heap, [weight, node, neighbor])
    return mst

Kruskal’s Algorithm

Kruskal’s algorithm is a greedy algorithm that finds the minimum spanning tree of a weighted undirected graph.

It starts by sorting all the edges in non-decreasing order of their weights. Then it repeatedly adds the next edge with the smallest weight to the spanning tree, as long as it does not form a cycle with the edges already in the spanning tree.

The algorithm uses a union-find data structure to efficiently check for cycles.

The complexity of Kruskal’s algorithm is O(E log E), where E is the number of edges in the graph. This is because we need to sort the edges, which takes O(E log E) time, and we need to visit each edge at most once, which takes O(E) time. The union-find operations take O(log V) time, where V is the number of vertices in the graph, but this is dominated by the O(E log E) time complexity of sorting the edges. This algorithm is more efficient for sparse graphs, while Prim’s algorithm is more efficient for dense graphs.

def min_spanning_tree_kruskal(edges, n):
    # sort the edges by weight
    edges.sort(key=lambda x: x[2])
    uf = UnionFind(n)
    mst = []
    for src, dst, weight in edges:
        if uf.union(src, dst):
            mst.append([src, dst])
        if len(mst) == n - 1:
            break
    return mst

Deepth First Search (DFS)

DFS is a graph traversal algorithm that explores as far as possible along each branch before backtracking. It can be implemented using recursion or an explicit stack. The time complexity of DFS is O(V + E), where V is the number of vertices and E is the number of edges in the graph. This is because we need to visit each vertex and edge at most once.

Here is an example of finding the maximum Area of Island in a 2D grid using DFS. The idea is to iterate through each cell in the grid, and when we find a land cell (value 1), we perform a DFS to calculate the area of the island and update the maximum area found so far.

This implementation uses an iterative DFS approach with a stack to avoid recursion depth issues. We also maintain a set of visited cells to avoid counting the same cell multiple times.

class Solution:
    def get_area(self, grid, pos, visited, size):
        stack = [pos]
        area = 0
        while stack:
            r, c = stack.pop()
            if (
                r < 0 or c < 0 or
                r >= size[0] or c >= size[1] or
                (r, c) in visited or
                grid[r][c] == 0
            ):
                continue
            visited.add((r, c))
            area += 1
            stack.extend([
                (r - 1, c),
                (r + 1, c),
                (r, c - 1),
                (r, c + 1)
            ])
        return area

    def maxAreaOfIsland(self, grid):
        m, n = len(grid), len(grid[0])
        visited = set()
        max_area = 0

        for i in range(m):
            for j in range(n):
                if (i, j) not in visited and grid[i][j] == 1:
                    max_area = max(
                        max_area,
                        self.get_area(grid, (i, j), visited, (m, n))
                    )
        return max_area

Topological Sort

The topological sort of a directed acyclic graph (DAG) is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. Topological sorting is used in many applications, such as scheduling tasks, resolving symbol dependencies in linkers, and determining the order of compilation tasks.

flowchart LR
    A --> B
    A --> C
    B --> D
    C --> E
    D --> F
    E --> F

This is a DAG with 6 vertices and 6 edges. One possible topological sort of this graph is [A, B, C, D, E, F]. Another possible topological sort is [A, C, B, E, D, F]. There can be multiple valid topological sorts for a given DAG. If there are two graphs, the oder of elements between graphs in the topological sort is not important, we only need to make sure the order of elements in the same graph is correct.

def topological_sort(edges, n):
    adj = {}
    for i in range(1, n + 1):
        adj[i] = []
    for src, dst in edges:
        adj[src].append(dst)
    
    top_sort = []
    visited = set()
    for i in range(1, n + 1):
        dfs(i, adj, visited, top_sort)
    top_sort.reverse()
    return top_sort

def dfs(src, adj, visited, top_sort):
    if src in visited:
        return
    visited.add(src)
    for dst in adj[src]:
        dfs(dst, adj, visited, top_sort)
    top_sort.append(src)

Dynamic Programming

Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems and storing the results of those subproblems to avoid redundant work.

Q1 Knapsack Problem

Given a list of items, where each item has a weight and a value, and a maximum weight capacity, return the maximum value that can be obtained by selecting a subset of the items such that the total weight does not exceed the maximum weight capacity.

Brute force solution

time complexity is O(2^n) where n is the number of items, as we have two choices for each item: to include it or not to include it. Space complexity is O(n) due to the recursion stack.

def dfs(profit, weight, capacity):
    return dfs_helper(0, profit, weight, capacity)

def dfs_helper(i, profit, weight, capacity):
    if i == len(profit):
        return 0
    # skip the item i
    max_profit  = dfs_helper(i + 1, profit, weight, capacity)
    # include the item i
    new_capacity = capacity - weight[i]
    if new_capacity >= 0:
        max_profit = max(max_profit, profit[i] + dfs_helper(i + 1, profit, weight, new_capacity))
    return max_profit

Better solution with memoization

The time complexity of this algorithm is O(n * capacity) for time and memory, where n is the number of items and capacity is the maximum weight capacity. This is because we are storing the results of subproblems in a memoization table, and each subproblem is defined by the index of the item and the remaining capacity. The space complexity is also O(n * capacity) due to the memoization table.

The idea is that when calculating the maximum profit through different paths will use the subproblems with the same parameters and we do not need to calculate the same subproblem multiple times.

def memory_dfs(profit, weight, capacity):
    # given the number of items we treated and the remain capacity
    # we can decide the max_profit, so this result can be reuse
    n, m = len(profit), capacity
    cache = [[-1] * (m+1) for _ in range(n)]
    return memory_helper(0, profit, weight, capacity, cache)

def memory_helper(i, profit, weight, capacity, cache):
    if i == len(profit):
        return 0
    elif cache[i][capacity] != -1:
        return cache[i][capacity]
    # skip the item i
    cache[i][capacity]  = memory_helper(i + 1, profit, weight, capacity, cache)
    # include the item i
    new_capacity = capacity - weight[i]
    if new_capacity >= 0:
        cache[i][capacity] = max(
            cache[i][capacity],
            profit[i] + memory_helper(i + 1, profit, weight, new_capacity, cache))
    return cache[i][capacity]

Dynamic Programming Bottom-Up Solution

We can implement the dynamic programming solution iteratively. The time complexity of this algorithm is O(n * capacity) for time and space, where n is the number of items and capacity is the maximum weight capacity.

We build a table of n rows and capacity + 1 columns, where the entry at row i and column j represents the maximum profit that can be obtained by selecting a subset of the first i items with a maximum weight capacity of j. We fill this table iteratively based on whether we include or exclude each item, and we return the value in the bottom-right cell of the table as the final answer.

So we are going to fill the table row by row, and for each row, we fill the columns from left to right. The value in each cell is calculated based on the values in the previous row, which represent the maximum profit for the subproblems with one less item. This way, we can build up the solution to the original problem iteratively.

The transition function is as follows:

If we do not include the item i, then the maximum profit is the same as the maximum profit for the first i-1 items with the same capacity, which is represented by the value in the previous row and the same column.
If we include the item i, then the maximum profit is the profit of item i plus the maximum profit for the first i-1 items with a reduced capacity (capacity - weight of item i), which is represented by the value in the previous row and the column corresponding to the reduced capacity.

def dp(profit, weight, capacity):
    n = len(profit)
    dp_table = [[0 for _ in range(capacity + 1)] for _ in range(n + 1)]

    for i in range(1, n + 1):
        for c in range(capacity + 1):
            dp_table[i][c] = dp_table[i - 1][c]
            if weight[i - 1] <= c:
                dp_table[i][c] = max(
                    dp_table[i][c],
                    profit[i - 1] + dp_table[i - 1][c - weight[i - 1]]
                )
    return dp_table[n][capacity]

Q2 Unbounded Knapsack Problem

This one is similar to the knapsack problem, but we can use unlimited number of each item. The time complexity of this algorithm is O(n * capacity) for time and space, where n is the number of items and capacity is the maximum weight capacity. But for the exsauted search solution, the time complexity can be O(2^capacity) in the worst case, as we have two choices for each item: to include it or not to include it, and we can include the same item multiple times. The space complexity is O(n * capacity) due to the memoization table.

The brute force solution

def dfs(profit, weight, capacity):
    return dfs_helper(0, profit, weight, capacity)

def dfs_helper(i, profit, weight, capacity):
    if i == len(profit):
        return 0
    # skip the item i
    max_profit  = dfs_helper(i + 1, profit, weight, capacity)
    # include the item i
    new_capacity = capacity - weight[i]
    if new_capacity >= 0:
        # here we set i instead of i + 1 because we can include the same item multiple times
        p = profit[i] + dfs_helper(i, profit, weight, new_capacity)
        max_profit = max(max_profit, p)
    return max_profit

The memorization solution

we can use the same memoization table as the knapsack problem, but we need to change the transition function to allow for including the same item multiple times.

def memory_dfs(profit, weight, capacity):
    # given the number of items we treated and the remain capacity
    # we can decide the max_profit, so this result can be reuse
    n, m = len(profit), capacity
    cache = [[-1] * (m+1) for _ in range(n)]
    return memory_helper(0, profit, weight, capacity, cache)

def memory_helper(i, profit, weight, capacity, cache):
    if i == len(profit):
        return 0
    elif cache[i][capacity] != -1:
        return cache[i][capacity]
    # skip the item i
    cache[i][capacity]  = memory_helper(i + 1, profit, weight, capacity, cache)
    # include the item i
    new_capacity = capacity - weight[i]
    if new_capacity >= 0:
        cache[i][capacity] = max(
            cache[i][capacity],
            # Here is the key difference from the knapsack problem, we can include the same item multiple times, so we do not move to the next item after including the current item.
            profit[i] + memory_helper(i, profit, weight, new_capacity, cache))
    return cache[i][capacity]

The dynamic programming solution

The key difference from the 0/1 knapsack DP is in the inclusion case: we look at dp_table[i] (the current row) instead of dp_table[i-1] (the previous row), because after including item i we can include it again.

def dp(profit, weight, capacity):
    n = len(profit)
    dp_table = [[0 for _ in range(capacity + 1)] for _ in range(n + 1)]

    for i in range(1, n + 1):
        for c in range(capacity + 1):
            dp_table[i][c] = dp_table[i - 1][c]
            if weight[i - 1] <= c:
                dp_table[i][c] = max(
                    dp_table[i][c],
                    # use dp_table[i] instead of dp_table[i-1] to allow reusing item i
                    profit[i - 1] + dp_table[i][c - weight[i - 1]]
                )
    return dp_table[n][capacity]

Q3 Longest Common Subsequence

Given two string s1 and s2, find the length of the longest common subsequence between the two strings. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. Leetcode 1143

Brute force solution

The idea is that we can use two pointers to traverse the two strings, and we have two choices at each step: if the characters at the two pointers are the same, we can include that character in the longest common subsequence and move both pointers; if the characters are different, we can either move the pointer of the first string or move the pointer of the second string, and we take the maximum of the two choices. The time complexity of this algorithm is O(2^(m+n)) in the worst case, where m and n are the lengths of the two strings, as we have two choices for each character in both strings. The space complexity is O(m+n) due to the recursion stack.

def fds(text1, text2):
    return helper(text1, text2, 0, 0)

def helper(text1, text2, i1, i2):
    if i1 == len(text1) or i2 == len(text2):
        return 0
    if text1[i1] == text2[i2]:
        return 1 + helper(text1, text2, i1+1, i2+1)
    else:
        return max(
            self.helper(text1, text2, i1+1, i2),
            self.helper(text1, text2, i1, i2+1)
        )

The memorization solution

The idea is that when calculating the longest common subsequence through different paths will use the subproblems with the same parameters and we do not need to calculate the same subproblem multiple times. The time complexity of this algorithm is O(mn) for time and space, where m and n are the lengths of the two strings, as we are storing the results of subproblems in a memoization table, and each subproblem is defined by the indices of the two pointers. The space complexity is also O(mn) due to the memoization table.

def memory_dfs(text1, text2):
    m, n = len(text1), len(text2)
    cache = [[-1] * n for _ in range(m)]
    return helper(text1, text2, 0, 0, cache)

def helper(text1, text2, i1, i2, cache):
    if i1 == len(text1) or i2 == len(text2):
        return 0
    elif cache[i1][i2] != -1:
        return cache[i1][i2]
    if text1[i1] == text2[i2]:
        cache[i1][i2] = 1 + helper(text1, text2, i1+1, i2+1, cache)
    else:
        cache[i1][i2] = max(
            helper(text1, text2, i1+1, i2, cache),
            helper(text1, text2, i1, i2+1, cache)
        )
    return cache[i1][i2]

Bottom-up dynamic programming solution

def dp(s1, s2):
    n, m = len(s1), len(s2)
    df = [[0] * (m + 1) for _ in range(n + 1)]
    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if s1[i - 1] == s2[j - 1]:
                df[i][j] = df[i - 1][j - 1] + 1
            else:
                df[i][j] = max(df[i - 1][j], df[i][j - 1])
    return df[n][m]

Palindromes

Find the longest palindromic substring in a given string. Leetcode 5 The time complexity of this algorithm is O(n^2) and the space complexity is O(1) if we do not consider the space used to store the result, as we are expanding around each center of the palindrome and checking for palindromic substrings. The number of centers is 2n-1 (each character and the gap between characters), and for each center, we can expand at most n times in the worst case.

def longestPalindrome(s: str) -> str:
    def expand(l, r):
        while l >= 0 and r < len(s) and s[l] == s[r]:
            l -= 1
            r += 1
        return l + 1, r - 1
    start, end = 0, 0
    for i in range(len(s)):
        # odd
        l1, r1 = expand(i, i)
        # even
        l2, r2 = expand(i, i + 1)
        if r1 - l1 > end - start:
            start, end = l1, r1
        if r2 - l2 > end - start:
            start, end = l2, r2
    return s[start:end + 1]

Array#

Kadane’s Algorithm#

Sliding Window#

Two pointers#

Q1#

Prefix Sums#

Q1#

Linked List#

Fast and Slow Pointers#

Q1#

Q2#

Q3#

Trie (Prefix Tree)#

Union-Find#

Segment Tree#

Iterative DFS#

Traversal#

Two Heaps#

Backtracking#

Subsets#

Q1 Subsets#

Q2 Distinct Subsets#

Combinations#

Permutations#

Graph#

Dijkstra’s Algorithm#

Prim’s Algorithm#

Kruskal’s Algorithm#

Deepth First Search (DFS)#

Topological Sort#

Dynamic Programming#

Q1 Knapsack Problem#

Brute force solution#

Better solution with memoization#

Dynamic Programming Bottom-Up Solution#

Q2 Unbounded Knapsack Problem#

The brute force solution#

The memorization solution#

The dynamic programming solution#

Q3 Longest Common Subsequence#

Brute force solution#

The memorization solution#

Bottom-up dynamic programming solution#

Palindromes#

Array

Kadane’s Algorithm

Sliding Window

Two pointers

Q1

Prefix Sums

Q1

Linked List

Fast and Slow Pointers

Q1

Q2

Q3

Trie (Prefix Tree)

Union-Find

Segment Tree

Iterative DFS

Traversal

Two Heaps

Backtracking

Subsets

Q1 Subsets

Q2 Distinct Subsets

Combinations

Permutations

Graph

Dijkstra’s Algorithm

Prim’s Algorithm

Kruskal’s Algorithm

Deepth First Search (DFS)

Topological Sort

Dynamic Programming

Q1 Knapsack Problem

Brute force solution

Better solution with memoization

Dynamic Programming Bottom-Up Solution

Q2 Unbounded Knapsack Problem

The brute force solution

The memorization solution

The dynamic programming solution

Q3 Longest Common Subsequence

Brute force solution

The memorization solution

Bottom-up dynamic programming solution

Palindromes