Array
Kadane’s Algorithm
how to find the maximum sum of a contiguous subarray in an array of integers.
# Brute Force
def max_subarray_sum(arr):
max_sum = float('-inf')
for i in range(len(arr)):
for j in range(i, len(arr)):
current_sum = sum(arr[i:j+1])
max_sum = max(max_sum, current_sum)
return max_sum
The complexity of this algorithm is O(n^3) because we have two nested loops and the sum function takes O(n) time.
# sliding window O(n)
def maxSubArray(self, nums: List[int]) -> int:
left = 0
right = 0
n = len(nums)
cur = nums[left]
m_sum = cur
while True:
if cur < 0:
left = right + 1
right = left
cur = 0
if left >= n:
break
cur = nums[left]
else:
right += 1
if right >= n:
break
cur += nums[right]
if cur > m_sum:
m_sum = cur
return m_sum
But it is not very elegant.
# Kadane's Algorithm O(n)
def kadanes(nums):
max_sum = nums[0]
cur_sum = 0
for n in nums:
cur_sum = max(0, cur_sum)
cur_sum += n
max_sum = max(max_sum, cur_sum)
return max_sum
Kadane’s algorithm has a big overlape with the sliding window. but if we want to get he left and right pointers for the maximum subarray, we need to use the sliding window one to track the left and right pointers.
Sliding Window
You are given an array of integers nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position. Get the max element for each window. leetcode 239
def maxSlidingWindow(nums: List[int], k: int) -> List[int]:
max_sw = []
left, right = 0, 0
n = len(nums)
while right < n:
e_r = nums[right]
if e_r in win:
win[e_r] += 1
else:
win[e_r] = 1
if right - left + 1 == k:
max_sw.append(max(win))
e_l = nums[left]
if win[e_l] > 1:
win[e_l] -= 1
else:
del win[e_l]
left += 1
right += 1
return max_sw
But this one will have O(nk), if n and k are both big, we will have time limit issue.
dq = deque() # stores indices
result = []
for i in range(len(nums)):
# 1️⃣ Remove indices out of window
if dq and dq[0] == i - k:
dq.popleft()
# 2️⃣ Maintain decreasing order
while dq and nums[dq[-1]] < nums[i]:
dq.pop()
# 3️⃣ Add current index
dq.append(i)
# 4️⃣ Window formed
if i >= k - 1:
result.append(nums[dq[0]])
return result
We put element in order in the deque and when a new element comes in, we remove all smaller elements before it. O(n), we remove max n element, so keep order do not effect the complexity.
Two pointers
sliding window is a subset of two pointers and in two points, we concentrate more on the two elements on each side. There are questions like Check if an array is a palindrome, two sum, target sum.
Q1
Given a sorted of integers, return the indixes of two elements (in different positions) that sum up to the target value. Assume there is exactly one solution.
def targerSum(nums, target):
L,R = 0, len(nums) - 1
while L < R:
if nums[L] + nums[R] > target:
R -= 1
elif nums[L] + nums[R] < target:
L += 1
else:
return [L, R]
Prefix Sums
Prefix: sublist that start at the beginning. prefix sum and prefix product can be presented by the recursive pattern. Postfix is like the prefix and they are two sides of the same coin.
Prefix sum can used to optimise the calculation. For example to calculate the sum of an subarray. we can
class prefix_sum:
def __init__(self, nums):
self.prefix = []
total = 0
for n in nums:
total += n
self.prefix.append(total)
def subsum(self, left, right):
pre_right = self.prefix[right]
pre_left = self.prefix[left-1] if left > 0 else 0
return pre_right - pre_left
Q1
Given an integer array nums, return an array answer such that answer[i] is equal to the product of all the elements of nums except nums[i].
The product of any prefix or suffix of nums is guaranteed to fit in a 32-bit integer.
You must write an algorithm that runs in O(n) time and without using the division operation. Leetcode 238
def productExceptSelf(self, nums: List[int]) -> List[int]:
n = len(nums)
prefix, suffix = [1], [1]
result = []
for i in range(n-1):
prefix.append(prefix[-1] * nums[i])
suffix.append(suffix[-1] * nums[n-1-i])
for i in range(n):
result.append(prefix[i] * suffix[n-i-1])
return result
# Or we can allocate the memory first
def productExceptSelf(self, nums: List[int]) -> List[int]:
result = [1] * len(nums)
right, left = 1, 1
for i in range(len(nums)):
result[i] *= left
left *= nums[i]
for i in range(len(nums)-1, -1, -1):
result[i] *= right
right *= nums[i]
return result
Linked List
Fast and Slow Pointers
It is also called Floyd’s tortoise and hare algorithm. We can use it to find the middle of a linked list, to detect if there is a cycle in a linked list and to find the entry point of the cycle.
Q1
Find the middle of a linked list. If there are two middle nodes, return the second middle node.
def middle_of_list(head):
# odd and even case are both covered by this algorithm, when there are two middle nodes, the slow pointer will jump over the first middle node and stop at the second middle node.
slow = head
fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
return slow
Q2
Given head, the head of a linked list, determine if the linked list has a cycle in it. The basic method is to use a set to store the visited nodes pointer hash, if we visit a node that is already in the set, then there is a cycle. But this method has O(n) space complexity. We can use fast and slow pointers to achieve O(1) space complexity.
The idea is that if there is a cycle, the fast pointer will eventually catch up with the slow pointer. If there is no cycle, the fast pointer will reach the end of the list.
Will they meet when there is a cycle? Yes, they will meet. The fast pointer moves twice as fast as the slow pointer, so it will eventually catch up with the slow pointer. And they can be proved by the mathematical induction. If they start att the same position $$ (1 + 2\times t) \mod n == (1 + t) \mod n $$ These two quation represent the position of the fast pointer and the slow pointer after t steps. When they meet, they will be at the same position, so we can solve this equation to find the value of t. t is the times of step and n is the length of the cycle. We can find the they will meet at the t is the multiple of n, which means they will meet at the same position after n steps.
def has_cycle(head):
slow = head
fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast:
return True
return False
Q3
Given head, the head of a linked list, return the node where the cycle begins. If there is no cycle, return null.
flowchart LR
1(("1:Head")) --> 2
2 --> 3["3:entry point"]
3 --> 4
4 --> 5(("5:meet point"))
5 --> 6
6 --> 7
7 --> 3
1 -. p .-> 3
3 -. n-k .-> 5
5 -. k .-> 3
From head to entry point, we have p steps. The circle length is n. From entry point to meet point, we have k steps. So from meet point to entry point again, we have n-k steps. When they meet, the fast pointer has moved 2(p+k) steps, and the slow pointer has moved p+k steps. So when faster pointer and slow pointer meet we have: $$ 2(p+n-k) = p + n - k + n*m $$ Where m is the times of cycle. we ahve $$ p - k = n(m-1) $$
if m is 1, we have p = k, which means the entry point is the meet point.
so we can then put another pointer at the head of the list, and move it and the slow pointer one step at a time, they will meet at the entry point of the cycle.
def find_cycle_entry(head):
slow = head
fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast:
break
if not fast or not fast.next:
return None
entry = head
while entry != slow:
entry = entry.next
slow = slow.next
return entry
Trie (Prefix Tree)
The complexity of searching word or inserting word are both O(1). This can be achieved by using a hash map. But for Prefix Tree, we can also search with prefix with O(1) time complexity.
This algorithm is very useful for autocomplete and spell checker. It can also be used to solve some problems like word search, word search II, etc.
Basic a trie is a tree of characters, each node represents a character and the path from the root to a node represents a word. We can use a hash map to store the children of each node. For English, each node will have at most 26 children.
class TrieNode:
def __init__(self):
self.children = {}
self.word = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
curr = self.root
for c in word:
if c not in curr.children:
curr.children[c] = TrieNode()
curr = curr.children[c]
curr.word = True
def search(self, word):
curr = self.root
for c in word:
if c not in curr.children:
return False
curr = curr.children[c]
return curr.word
def startsWith(self, prefix):
curr = self.root
for c in prefix:
if c not in curr.children:
return False
curr = curr.children[c]
return True
Union-Find
Union-Find also called disjoint sets. It is a good algorithm to group the linked nodes in a undirected graph, especially when the graph is large and sparse. For directed graph, this algorithm is hard to use as the algorithm only considers whether there is link between nodes or not.
DFS can normally accomplish the same task, but Union-Find can be more efficient. We often use DFS to do the graph traversal and use Union-Find to find the connected components in a graph or detect cycles.
Complexity analysis: Let’s say we have n nodes and m edges, to find the connected components, the time complexity of DFS is O(n+m), as we need to visit each node and edge, while the time complexity of Union-Find is O(m*α(n)), where α(n) is the inverse Ackermann function, which is very slow growing and can be considered as a constant for all practical purposes. So Union-Find can be more efficient than DFS when the graph is large and sparse.
The idea is to maintain a parent array, where parent[i] is the parent of node i. Initially, each node is its own parent, so we have parent[i] = i for all i. When we union two nodes, we set the parent of one node to be the other node. To find the root of a node, we can follow the parent pointers until we reach a node that is its own parent.
class UnionFind:
def __init__(self, n):
# each node is its own parent initially
self.parent = [i for i in range(n)]
# weight[i] is the size of the tree rooted at i
self.weight = [1] * n
def find(self, x):
if self.parent[x] != x:
# path compression
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
rootX = self.find(x)
rootY = self.find(y)
if rootX == rootY:
return False
if rootX != rootY:
# attach the small tree to the bigger one
if self.weight[rootX] > self.weight[rootY]:
self.parent[rootY] = rootX
self.weight[rootX] += self.weight[rootY]
else:
self.parent[rootX] = rootY
self.weight[rootY] += self.weight[rootX]
return True
def build(self, edges):
"""Build the union-find structure from the edges of the graph.
The complexity of this function is O(m*α(n)),
where m is the number of edges and n is the number of nodes.
α(n) present the time complexity of find function,
which is very slow growing and
can be considered as a constant for all practical purposes with path compression
and union by weight.
So the overall complexity of this function can be considered as O(m).
"""
for x, y in edges:
self.union(x, y)
def connected(self, x, y):
"""Check if two nodes are connected.
"""
return self.find(x) == self.find(y)
def count_components(self):
"""Count the number of connected components in the graph.
"""
roots = set()
for i in range(len(self.parent)):
roots.add(self.find(i))
return len(roots)
def cycle_detection(self, edges):
"""Detect if there is a cycle in the graph. If there is a cycle, return True, otherwise return False.
"""
for x, y in edges:
if self.connected(x, y):
return True
self.union(x, y)
return False
Segment Tree
Segement tree is a binary tree data structure that allows us to efficiently query and update the values of a range of elements in an array. It is often used in problems that require frequent updates and queries on an array, such as range sum queries, range minimum queries, greatest common divisor queries, and any other associative operation queries.
The complexity of update and query operations in a segment tree is O(log n), where n is the number of elements in the array. The space complexity of a segment tree is O(4n) in the worst case, which can be considered as O(n) for practical purposes.
If we use an array to do update and query, the time complexity of both operations is O(n), which can be inefficient for large arrays. Segment tree can significantly reduce the time complexity to O(log n) for both operations. Even we ca use prefix sum to do range sum query, the time complexity of query is O(1), but the time complexity of update is O(n), which can be inefficient for large arrays.
The Idea of segment tree is to build a binary tree where each node represents a segment of the array. The leaf nodes represent individual elements of the array, and the internal nodes represent the result of applying the associative operation (e.g., sum, min, max) on the segments represented by their children.
class SegmentTree:
""" simple segment tree implementation for sum operation"""
def __init__(self, total, l_b, r_b):
"""
Args
total: total sum
l_b: left bound
r_b: right bound
"""
self.sum = total
self.left = None
self.right = None
self.l_b = l_b
self.r_b = r_b
@staticmethod
def build(nums, l_b, r_b):
""" build the segment tree with O(n) with n is
the number of nodes and n is the number of leafs
"""
if l_b == r_b:
return SegmentTree(nums[l_b], l_b, r_b)
m = (l_b + r_b) // 2
root = SegmentTree(0, l_b, r_b)
root.left = SegmentTree.build(nums, l_b , m)
root.right = SegmentTree.build(nums, m + 1, r_b)
root.sum = root.left.sum + root.right.sum
return root
def update(self, index, val):
if self.l_b == self.r_b:
self.sum = val
return
m = (self.l_b + self.r_b) // 2
if index > m:
self.right.update(index, val)
else:
self.left.update(index, val)
self.sum = self.left.sum + self.right.sum
# The time complexity of this function is O(log n) where n is the number of leaf nodes in the segment tree, which is the same as the number of elements in the original array. This is because in each step of the recursion, we are halving the range of the query, and we will have at most log n levels of recursion until we reach the leaf nodes.
def range_query(self, l_b, r_b):
if l_b == self.l_b and r_b == self.r_b:
return self.sum
m = (self.l_b + self.r_b) // 2
if l_b > m:
return self.right.range_query(l_b, r_b)
elif r_b <= m:
return self.left.range_query(l_b, r_b)
else:
return (self.left.range_query(l_b, m) +
self.right.range_query(m + 1, r_b))
Iterative DFS
Iterative DFS is a way to perform depth-first search on a graph or a tree without using recursion. It is like simulating the call stack of the recusive DFS with an explicit stack data structure.
Traversal
class TreeNode:
def __init__(self, val, left, right):
self.val = val
self.left = left
self.right = right
def inorder(root):
# time and space o(n), space complexity is due to the inbalance case.
stack = []
cur = root
while cur or stack:
if cur:
stack.append(cur)
cur = cur.left
else:
cur = stack.pop()
print(cur.val)
cur = cur.right
def preorder(root):
stack = []
cur = root
while cur or stack:
if cur:
print(cur.val)
if cur.right:
stack.append(cur.right)
cur = cur.left
else:
cur = stack.pop()
def postorder(root):
# The key idea is that the we do not print the first visit
# only print the parent when ths fist and second children
# have been passed
stack = [root]
visit = [False]
while stack:
cur, visited = stack.pop(), visit.pop()
if cur:
if visited:
print(cur.val)
else:
stack.append(cur)
visit.append(True)
stack.append(cur.right)
visit.append(False)
stack.append(cur.left)
visit.append(False)
Two Heaps
Two Heaps is used to do the task like finding the median of a stream of numbers. Two heaps is a data structure that consists of two heaps, one is a max heap and the other is a min heap. The max heap is used to store the smaller half of the numbers, and the min heap is used to store the larger half of the numbers. The idea is to maintain the balance between the two heaps, so that the difference in size between the two heaps is at most 1. This way, we can easily find the median of the numbers in O(1) time.
import heapq
class TwoHeapMedianFinder:
def __init__(self):
self.small = [] # max heap
self.large = [] # min heap
def insert(self, num: int) -> None:
""" Heap insertion """
# push first to the small max heap and push to large min heap if needed
heapq.heappush(self.small, -num)
if self.small and self.large and (-self.small[0] > self.large[0]):
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
# Uneven size handle
if len(self.small) > len(self.large) + 1:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
if len(self.large) > len(self.small) + 1:
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)
def get_median(self):
""" Heap get minimum """
l_s, l_l = len(self.small), len(self.large)
if l_s > l_l:
return -self.small[0]
elif l_s < l_l:
return self.large[0]
else:
return (-self.small[0] + self.large[0]) // 2
Backtracking
Backtracking is a general algorithm for finding all (or some) solutions to some computational problems, notably constraint satisfaction problems. It incrementally builds candidates to the solutions, and abandons a candidate (“backtracks”) as soon as it determines that the candidate cannot possibly be completed to a valid solution.
Subsets
Q1 Subsets
Given an integer array nums of unique elements, return all possible subsets (the power set). The solution set must not contain duplicate subsets. The subsets can be returned in any order. Leetcode 78
def subsets_without_dup(nums):
cur_set, subsets = [], []
helper(0, nums, cur_set, subsets)
return subsets
def helper(i, nums, cur_set, subsets):
if i >= len(nums):
subsets.append(cur_set.copy())
return
# decision to include nums[i]
cur_set.append(nums[i])
helper(i + 1, nums, cur_set, subsets)
# this step is the kernel of backtracking
cur_set.pop()
# decision to not include nums[i]
helper(i + 1, nums, cur_set, subsets)
Q2 Distinct Subsets
Given a list of nums that are not necessarily distinct, return all distinct subsets (the power set). The solution set must not contain duplicate subsets. The subsets can be returned in any order. Leetcode 90
The complexity of this algorithm is O(2^n) in the worst case, where n is the number of elements in the input array. This is because in the worst case, all elements are distinct and we have to generate all possible subsets, which is 2^n. However, if there are duplicates in the input array, the number of distinct subsets will be less than 2^n, and the actual time complexity will depend on the number of distinct subsets generated. The sorting step takes O(n log n) time, but it is dominated by the O(2^n) time complexity of generating subsets in the worst case.
def subsets_without_dup(nums):
nums.sort() # sort the array to handle duplicates
cur_set, subsets = [], []
helper(0, nums, cur_set, subsets)
return subsets
def helper(i, nums, cur_set, subsets):
if i >= len(nums):
subsets.append(cur_set.copy())
return
# decision to include nums[i]
cur_set.append(nums[i])
helper(i + 1, nums, cur_set, subsets)
# this step is the kernel of backtracking
cur_set.pop()
# decision to not include nums[i]
while i + 1 < len(nums) and nums[i] == nums[i + 1]:
i += 1
helper(i + 1, nums, cur_set, subsets)
Combinations
Given two nums n and k, return all possible combinations of k numbers out of 1 … n. Leetcode 77, the combinnation is like subsets but the length of the subset is fixed to k and the order of the numbers in the subset does not matter. we can use backtracking to solve this problem as well.
The complexity of this algorithm is O(k * C(n, k)), where C(n, k) is the number of combinations of n items taken k at a time. This is because we are generating all possible combinations of k numbers from 1 to n, and each combination takes O(k) time to generate.
def combination(n, k):
cur_set, subsets = [], []
helper(0, n, k, cur_set, subsets)
return subsets
def helper(i, n, k, cur_set, subsets):
if len(cur_set) == k:
subsets.append(cur_set.copy())
return
if i >= n:
return
cur_set.append(i+1) # the number
helper(i + 1, n, k, cur_set, subsets)
cur_set.pop()
helper(i + 1, n, k, cur_set, subsets)
# Combinations II: Use the combination logic, we start with n choice and then we have n-1 choice and then n-2 choice, until we have k choice. So we can use the same backtracking logic as combination, but we need to add a condition to skip the numbers that are already in the current set. This way we can avoid generating duplicate combinations.
def combination2(n, k):
cur_set, subsets = [], []
helper2(0, n, k, cur_set, subsets)
return subsets
def helper2(i, n, k, cur_set, subsets):
if len(cur_set) == k:
subsets.append(cur_set.copy())
return
if i >= n:
return
for j in range(i + 1, n + 1):
cur_set.append(j)
helper2(j, n, k, cur_set, subsets)
cur_set.pop()
Permutations
Give a list of distinct nums, return all possible permutations of the list. Leetcode 46 The complexity of this algorithm is O(n * n!), where n is the number of elements in the input array. This is because there are n! permutations of n elements, and each permutation takes O(n) time to generate (since we need to copy the current permutation to the result list).
class Solution:
def permute(self, nums: List[int]) -> List[List[int]]:
perms = []
n = len(nums)
self.helper(nums, n, [], perms)
return perms
def helper(self, nums, n, cur_perm, perms):
if len(cur_perm) == n:
perms.append(cur_perm)
return
for j, num in enumerate(nums):
new_perm = cur_perm + [num]
nums_rest = nums[:j] + nums[j + 1:]
self.helper(nums_rest, n, new_perm, perms)
Graph
Dijkstra’s Algorithm
Dijkstra’s algorithm is a graph algorithm that finds the shortest path from a source vertex to all other vertices in a weighted graph. It is a greedy algorithm that repeatedly selects the vertex with the smallest distance from the source and updates the distances of its neighbors.
BFS can be used to find the shortest path in an unweighted graph, but it cannot be used to find the shortest path in a weighted graph. Dijkstra’s algorithm can be used to find the shortest path in a weighted graph, but it cannot be used to find the shortest path in a graph with negative weight edges. For graphs with negative weight edges, we can use Bellman-Ford algorithm.
Dijkstra’s algorithm can be implemented using a priority queue (min heap) to efficiently get the vertex with the smallest distance. The algorithm starts by initializing the distance to the source vertex as 0 and the distance to all other vertices as infinity. Then it repeatedly selects the vertex with the smallest distance, updates the distances of its neighbors, and adds the neighbors to the priority queue if their distances are updated. The idea is that we prioritize the vertices with the smallest distance from the source vertex, so that we can efficiently find the shortest path to all other vertices in the graph.
The complexity of Dijkstra’s algorithm is O((V + E) log V), where V is the number of vertices and E is the number of edges in the graph. This is because we need to visit each vertex and edge at most once, and we use a priority queue to keep track of the vertices with the smallest distance, which takes O(log V) time for each insertion and extraction.
Given a connected graph represented by a list of edges, where each edge is represented as a tuple (u, v, w) indicating an edge from vertex u to vertex v with weight w, and a source vertex s, return the shortest distance from the source vertex s to all other vertices in the graph. If a vertex is unreachable from the source vertex, return -1 for that vertex. Leetcode 743
import heapq
def shortest_path(edges, n, src):
adj = {}
for i in range(1, n + 1):
adj[i] = []
for s, d, w in edges:
# source, destinatio, weights
adj[s].append((d, w))
shortest = {}
# a head of (weigh, destination)
min_heap = [(0, src)]
while min_heap:
w1, n1 = heapq.heappop(min_heap)
if n1 in shortest:
continue
shortest[n1] = w1
for n2, w2 in adj[n1]:
if n2 not in shortest:
# distance from src update
heapq.heappush(min_heap, (w1 + w2, n2))
return shortest
Prim’s Algorithm
Prim’s algorithm is a greedy algorithm that finds the minimum spanning tree of a weighted undirected connected graph. It starts with an arbitrary vertex and grows the spanning tree by repeatedly adding the edge with the smallest weight that connects a vertex in the tree to a vertex outside the tree.
So for the spanning tree, the tree is acyclic and it connects all the vertices in the graph with the minimum total edge weight. Prim’s algorithm is efficient for dense graphs, while Kruskal’s algorithm is more efficient for sparse graphs. The complexity of Prim’s algorithm is O(E log V), where E is the number of edges and V is the number of vertices in the graph. This is because we need to visit each edge at most once, and we use a priority queue to keep track of the edges with the smallest weight, which takes O(log V) time for each insertion and extraction.
It does matter where we start the spanning tree, but the total weight of the minimum spanning tree will be the same regardless of the starting vertex.
import heapq
def min_spanning_tree(edges, n):
# construct the adjacent dictionary
adj = {}
for i in range(1, n + 1):
adj[i] = []
for src, dst, weight in edges:
adj[src].append([dst, weight])
adj[dst].append([src, weight])
# Initialise the heap by chosing a single node
# pushing all its neighbours
min_heap = []
for neighbor, weight in adj[1]:
heapq.heappush(min_heap, [weight, 1, neighbor])
mst = [] # min spinning tree result
visited = set()
visited.add(1)
while min_heap:
weight, src, node = heapq.heappop(min_heap)
if node in visited:
continue
mst.append([src, node])
visited.add(node)
for neighbor, weight in adj[node]:
if neighbor not in visited:
heapq.heappush(min_heap, [weight, node, neighbor])
return mst
Kruskal’s Algorithm
Kruskal’s algorithm is a greedy algorithm that finds the minimum spanning tree of a weighted undirected graph.
It starts by sorting all the edges in non-decreasing order of their weights. Then it repeatedly adds the next edge with the smallest weight to the spanning tree, as long as it does not form a cycle with the edges already in the spanning tree.
The algorithm uses a union-find data structure to efficiently check for cycles.
The complexity of Kruskal’s algorithm is O(E log E), where E is the number of edges in the graph. This is because we need to sort the edges, which takes O(E log E) time, and we need to visit each edge at most once, which takes O(E) time. The union-find operations take O(log V) time, where V is the number of vertices in the graph, but this is dominated by the O(E log E) time complexity of sorting the edges. This algorithm is more efficient for sparse graphs, while Prim’s algorithm is more efficient for dense graphs.
def min_spanning_tree_kruskal(edges, n):
# sort the edges by weight
edges.sort(key=lambda x: x[2])
uf = UnionFind(n)
mst = []
for src, dst, weight in edges:
if uf.union(src, dst):
mst.append([src, dst])
if len(mst) == n - 1:
break
return mst
Topological Sort
The topological sort of a directed acyclic graph (DAG) is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. Topological sorting is used in many applications, such as scheduling tasks, resolving symbol dependencies in linkers, and determining the order of compilation tasks.
flowchart LR
A --> B
A --> C
B --> D
C --> E
D --> F
E --> F
This is a DAG with 6 vertices and 6 edges. One possible topological sort of this graph is [A, B, C, D, E, F]. Another possible topological sort is [A, C, B, E, D, F]. There can be multiple valid topological sorts for a given DAG. If there are two graphs, the oder of elements between graphs in the topological sort is not important, we only need to make sure the order of elements in the same graph is correct.
def topological_sort(edges, n):
adj = {}
for i in range(1, n + 1):
adj[i] = []
for src, dst in edges:
adj[src].append(dst)
top_sort = []
visited = set()
for i in range(1, n + 1):
dfs(i, adj, visited, top_sort)
top_sort.reverse()
return top_sort
def dfs(src, adj, visited, top_sort):
if src in visited:
return
visited.add(src)
for dst in adj[src]:
dfs(dst, adj, visited, top_sort)
top_sort.append(src)
Dynamic Programming
Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems and storing the results of those subproblems to avoid redundant work.
Q1 Knapsack Problem
Given a list of items, where each item has a weight and a value, and a maximum weight capacity, return the maximum value that can be obtained by selecting a subset of the items such that the total weight does not exceed the maximum weight capacity.
Brute force solution
time complexity is O(2^n) where n is the number of items, as we have two choices for each item: to include it or not to include it. Space complexity is O(n) due to the recursion stack.
def dfs(profit, weight, capacity):
return dfs_helper(0, profit, weight, capacity)
def dfs_helper(i, profit, weight, capacity):
if i == len(profit):
return 0
# skip the item i
max_profit = dfs_helper(i + 1, profit, weight, capacity)
# include the item i
new_capacity = capacity - weight[i]
if new_capacity >= 0:
max_profit = max(max_profit, profit[i] + dfs_helper(i + 1, profit, weight, new_capacity))
return max_profit
Better solution with memoization
The time complexity of this algorithm is O(n * capacity) for time and memory, where n is the number of items and capacity is the maximum weight capacity. This is because we are storing the results of subproblems in a memoization table, and each subproblem is defined by the index of the item and the remaining capacity. The space complexity is also O(n * capacity) due to the memoization table.
The idea is that when calculating the maximum profit through different paths will use the subproblems with the same parameters and we do not need to calculate the same subproblem multiple times.
def memory_dfs(profit, weight, capacity):
# given the number of items we treated and the remain capacity
# we can decide the max_profit, so this result can be reuse
n, m = len(profit), capacity
cache = [[-1] * (m+1) for _ in range(n)]
return memory_helper(0, profit, weight, capacity, cache)
def memory_helper(i, profit, weight, capacity, cache):
if i == len(profit):
return 0
elif cache[i][capacity] != -1:
return cache[i][capacity]
# skip the item i
cache[i][capacity] = memory_helper(i + 1, profit, weight, capacity, cache)
# include the item i
new_capacity = capacity - weight[i]
if new_capacity >= 0:
cache[i][capacity] = max(
cache[i][capacity],
profit[i] + memory_helper(i + 1, profit, weight, new_capacity, cache))
return cache[i][capacity]
Dynamic Programming Bottom-Up Solution
We can implement the dynamic programming solution iteratively. The time complexity of this algorithm is O(n * capacity) for time and space, where n is the number of items and capacity is the maximum weight capacity.
We build a table of n rows and capacity + 1 columns, where the entry at row i and column j represents the maximum profit that can be obtained by selecting a subset of the first i items with a maximum weight capacity of j. We fill this table iteratively based on whether we include or exclude each item, and we return the value in the bottom-right cell of the table as the final answer.
So we are going to fill the table row by row, and for each row, we fill the columns from left to right. The value in each cell is calculated based on the values in the previous row, which represent the maximum profit for the subproblems with one less item. This way, we can build up the solution to the original problem iteratively.
The transition function is as follows:
- If we do not include the item i, then the maximum profit is the same as the maximum profit for the first i-1 items with the same capacity, which is represented by the value in the previous row and the same column.
- If we include the item i, then the maximum profit is the profit of item i plus the maximum profit for the first i-1 items with a reduced capacity (capacity - weight of item i), which is represented by the value in the previous row and the column corresponding to the reduced capacity.
def dp(profit, weight, capacity):
n = len(profit)
dp_table = [[0 for _ in range(capacity + 1)] for _ in range(n + 1)]
for i in range(1, n + 1):
for c in range(capacity + 1):
dp_table[i][c] = dp_table[i - 1][c]
if weight[i - 1] <= c:
dp_table[i][c] = max(
dp_table[i][c],
profit[i - 1] + dp_table[i - 1][c - weight[i - 1]]
)
return dp_table[n][capacity]
Q2 Unbounded Knapsack Problem
This one is similar to the knapsack problem, but we can use unlimited number of each item. The time complexity of this algorithm is O(n * capacity) for time and space, where n is the number of items and capacity is the maximum weight capacity. But for the exsauted search solution, the time complexity can be O(2^capacity) in the worst case, as we have two choices for each item: to include it or not to include it, and we can include the same item multiple times. The space complexity is O(n * capacity) due to the memoization table.
The brute force solution
def dfs(profit, weight, capacity):
return dfs_helper(0, profit, weight, capacity)
def dfs_helper(i, profit, weight, capacity):
if i == len(profit):
return 0
# skip the item i
max_profit = dfs_helper(i + 1, profit, weight, capacity)
# include the item i
new_capacity = capacity - weight[i]
if new_capacity >= 0:
# here we set i instead of i + 1 because we can include the same item multiple times
p = profit[i] + dfs_helper(i, profit, weight, new_capacity)
max_profit = max(max_profit, p)
return max_profit
The memorization solution
we can use the same memoization table as the knapsack problem, but we need to change the transition function to allow for including the same item multiple times.
def memory_dfs(profit, weight, capacity):
# given the number of items we treated and the remain capacity
# we can decide the max_profit, so this result can be reuse
n, m = len(profit), capacity
cache = [[-1] * (m+1) for _ in range(n)]
return memory_helper(0, profit, weight, capacity, cache)
def memory_helper(i, profit, weight, capacity, cache):
if i == len(profit):
return 0
elif cache[i][capacity] != -1:
return cache[i][capacity]
# skip the item i
cache[i][capacity] = memory_helper(i + 1, profit, weight, capacity, cache)
# include the item i
new_capacity = capacity - weight[i]
if new_capacity >= 0:
cache[i][capacity] = max(
cache[i][capacity],
# Here is the key difference from the knapsack problem, we can include the same item multiple times, so we do not move to the next item after including the current item.
profit[i] + memory_helper(i, profit, weight, new_capacity, cache))
return cache[i][capacity]
The dynamic programming solution
The key difference from the 0/1 knapsack DP is in the inclusion case: we look at dp_table[i] (the current row) instead of dp_table[i-1] (the previous row), because after including item i we can include it again.
def dp(profit, weight, capacity):
n = len(profit)
dp_table = [[0 for _ in range(capacity + 1)] for _ in range(n + 1)]
for i in range(1, n + 1):
for c in range(capacity + 1):
dp_table[i][c] = dp_table[i - 1][c]
if weight[i - 1] <= c:
dp_table[i][c] = max(
dp_table[i][c],
# use dp_table[i] instead of dp_table[i-1] to allow reusing item i
profit[i - 1] + dp_table[i][c - weight[i - 1]]
)
return dp_table[n][capacity]
Q3 Longest Common Subsequence
Given two string s1 and s2, find the length of the longest common subsequence between the two strings. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. Leetcode 1143
Brute force solution
The idea is that we can use two pointers to traverse the two strings, and we have two choices at each step: if the characters at the two pointers are the same, we can include that character in the longest common subsequence and move both pointers; if the characters are different, we can either move the pointer of the first string or move the pointer of the second string, and we take the maximum of the two choices. The time complexity of this algorithm is O(2^(m+n)) in the worst case, where m and n are the lengths of the two strings, as we have two choices for each character in both strings. The space complexity is O(m+n) due to the recursion stack.
def fds(text1, text2):
return helper(text1, text2, 0, 0)
def helper(text1, text2, i1, i2):
if i1 == len(text1) or i2 == len(text2):
return 0
if text1[i1] == text2[i2]:
return 1 + helper(text1, text2, i1+1, i2+1)
else:
return max(
self.helper(text1, text2, i1+1, i2),
self.helper(text1, text2, i1, i2+1)
)
The memorization solution
The idea is that when calculating the longest common subsequence through different paths will use the subproblems with the same parameters and we do not need to calculate the same subproblem multiple times. The time complexity of this algorithm is O(mn) for time and space, where m and n are the lengths of the two strings, as we are storing the results of subproblems in a memoization table, and each subproblem is defined by the indices of the two pointers. The space complexity is also O(mn) due to the memoization table.
def memory_dfs(text1, text2):
m, n = len(text1), len(text2)
cache = [[-1] * n for _ in range(m)]
return helper(text1, text2, 0, 0, cache)
def helper(text1, text2, i1, i2, cache):
if i1 == len(text1) or i2 == len(text2):
return 0
elif cache[i1][i2] != -1:
return cache[i1][i2]
if text1[i1] == text2[i2]:
cache[i1][i2] = 1 + helper(text1, text2, i1+1, i2+1, cache)
else:
cache[i1][i2] = max(
helper(text1, text2, i1+1, i2, cache),
helper(text1, text2, i1, i2+1, cache)
)
return cache[i1][i2]
Bottom-up dynamic programming solution
def dp(s1, s2):
n, m = len(s1), len(s2)
df = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, m + 1):
if s1[i - 1] == s2[j - 1]:
df[i][j] = df[i - 1][j - 1] + 1
else:
df[i][j] = max(df[i - 1][j], df[i][j - 1])
return df[n][m]
Palindromes
Find the longest palindromic substring in a given string. Leetcode 5 The time complexity of this algorithm is O(n^2) and the space complexity is O(1) if we do not consider the space used to store the result, as we are expanding around each center of the palindrome and checking for palindromic substrings. The number of centers is 2n-1 (each character and the gap between characters), and for each center, we can expand at most n times in the worst case.
def longestPalindrome(s: str) -> str:
def expand(l, r):
while l >= 0 and r < len(s) and s[l] == s[r]:
l -= 1
r += 1
return l + 1, r - 1
start, end = 0, 0
for i in range(len(s)):
# odd
l1, r1 = expand(i, i)
# even
l2, r2 = expand(i, i + 1)
if r1 - l1 > end - start:
start, end = l1, r1
if r2 - l2 > end - start:
start, end = l2, r2
return s[start:end + 1]