🎯 Union Find
1️⃣ Core Framework
When discussing Union Find, I frame it as:
- What problem pattern Union Find solves
- Connected components
- Parent array
- Find operation
- Union operation
- Path compression
- Union by rank / size
- Cycle detection
- Graph connectivity problems
- Senior-level explanation
2️⃣ What Problem Are We Solving?
Union Find, also called Disjoint Set Union, is used to manage groups of connected elements.
It is commonly used for:
- Connected components
- Graph connectivity
- Cycle detection in undirected graphs
- Number of provinces
- Redundant connection
- Accounts merge
- Number of islands II
- Kruskal’s algorithm
- Dynamic connectivity
- Friend circles
- Similar string groups
- Satisfiability of equality equations
The core idea is:
Each element belongs to a set.
Union connects two sets.
Find tells which set an element belongs to.
👉 Interview Answer
Union Find is a data structure for tracking disjoint sets.
It supports two main operations: find, which returns the representative of a set, and union, which merges two sets.
It is especially useful for connectivity problems, connected components, and cycle detection in undirected graphs.
3️⃣ Why Union Find Works
Connectivity Problem
Suppose we have nodes:
0, 1, 2, 3, 4
And edges:
0 - 1
1 - 2
3 - 4
Then the connected components are:
{0, 1, 2}
{3, 4}
Union Find helps us maintain this grouping efficiently.
Representative
Each group has a representative root.
Example:
parent[0] = 0
parent[1] = 0
parent[2] = 0
This means:
0, 1, 2 belong to the same set
👉 Interview Answer
Union Find works by assigning each connected component a representative root.
If two nodes have the same root, they belong to the same component.
When an edge connects two different components, union merges those components into one.
4️⃣ Parent Array
Basic Idea
Union Find usually uses a parent array.
Initially:
parent[i] = i
This means every node is its own component.
Example
parent = [0, 1, 2, 3, 4]
After union 0 and 1:
parent = [0, 0, 2, 3, 4]
Now:
0 and 1 are in the same set
👉 Interview Answer
The parent array stores the parent of each node in the Union Find forest.
Initially, every node is its own parent, meaning every node starts as a separate component.
As we union nodes, components are merged by changing parent pointers.
5️⃣ Find Operation
Purpose
find(x) returns the root representative of x.
If:
find(a) == find(b)
Then:
a and b are connected
Basic Code
def find(x):
while parent[x] != x:
x = parent[x]
return x
Problem
Without optimization, the tree can become tall.
Worst case:
0 <- 1 <- 2 <- 3 <- 4
Then find can become O(n).
👉 Interview Answer
The find operation returns the root representative of a node.
Two nodes are connected if their roots are the same.
A basic find follows parent pointers until reaching a node whose parent is itself.
6️⃣ Path Compression
Core Idea
Path compression flattens the tree during find.
When finding the root of x, we make every node on the path point directly to the root.
Code
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
Why It Helps
Before compression:
4 → 3 → 2 → 1 → 0
After find(4):
4 → 0
3 → 0
2 → 0
1 → 0
Future find operations become much faster.
👉 Interview Answer
Path compression optimizes find by making nodes point directly to the root.
This flattens the tree over time.
After path compression, future find operations on the same component become almost constant time.
7️⃣ Union Operation
Purpose
union(a, b) merges the sets containing a and b.
Steps:
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
Basic Code
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
parent[root_b] = root_a
return True
Return Value
Returning False is useful when:
a and b are already connected
This often means:
adding this edge creates a cycle
👉 Interview Answer
The union operation merges two sets.
First, I find the root of each node.
If they already have the same root, they are already connected.
Otherwise, I attach one root to the other root and merge the components.
8️⃣ Union by Rank
Problem
If we always attach one root arbitrarily, the tree may become tall.
Union by rank keeps the tree shallow.
Rank Meaning
rank[root] = approximate height of the tree
Attach the smaller-rank tree under the larger-rank tree.
Code
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.rank[root_a] < self.rank[root_b]:
self.parent[root_a] = root_b
elif self.rank[root_a] > self.rank[root_b]:
self.parent[root_b] = root_a
else:
self.parent[root_b] = root_a
self.rank[root_a] += 1
return True
👉 Interview Answer
Union by rank keeps the tree shallow.
I attach the root with smaller rank under the root with larger rank.
If both ranks are equal, I choose one root and increase its rank.
Combined with path compression, Union Find operations are nearly constant time.
9️⃣ Union by Size
Alternative to Rank
Instead of rank, we can track component size.
size[root] = number of nodes in the component
Attach the smaller component under the larger component.
Code
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.size = [1] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
return True
👉 Interview Answer
Union by size attaches the smaller component under the larger component.
This keeps the Union Find tree shallow.
It is often easier to reason about than rank, especially when the problem also needs component sizes.
🔟 Complexity Analysis
Without Optimization
Find can be:
O(n)
in the worst case.
With Path Compression and Union by Rank / Size
Each operation is almost constant time:
O(α(n))
Where:
α(n) = inverse Ackermann function
In practice:
almost O(1)
For m Operations
Time: O(m α(n))
Space: O(n)
👉 Interview Answer
With path compression and union by rank or size, Union Find operations are almost constant time.
Technically the amortized time is O(α(n)), where α is the inverse Ackermann function.
In practice, it behaves like O(1).
1️⃣1️⃣ Connected Components
Problem
Given n nodes and edges, return the number of connected components.
Idea
Initially:
components = n
Every successful union reduces component count by 1.
Code
def count_components(n, edges):
parent = list(range(n))
rank = [0] * n
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
return True
components = n
for a, b in edges:
if union(a, b):
components -= 1
return components
👉 Interview Answer
To count connected components, I start with n components because every node is separate initially.
For each edge, if union successfully merges two different components, I decrease the component count by one.
At the end, the count is the number of connected components.
1️⃣2️⃣ Number of Provinces
Problem
Given an adjacency matrix, return the number of connected groups.
Code
def find_circle_num(is_connected):
n = len(is_connected)
parent = list(range(n))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
for i in range(n):
for j in range(i + 1, n):
if is_connected[i][j] == 1:
union(i, j)
roots = set()
for i in range(n):
roots.add(find(i))
return len(roots)
👉 Interview Answer
Number of Provinces is a connected-components problem.
Each city is a node, and a connection means two cities should be unioned.
After processing the matrix, the number of unique roots is the number of provinces.
1️⃣3️⃣ Cycle Detection in Undirected Graph
Problem
Given undirected edges, determine whether adding an edge creates a cycle.
Key Insight
If edge (a, b) connects two nodes already in the same component:
cycle exists
Code
def has_cycle(n, edges):
parent = list(range(n))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
parent[root_b] = root_a
return True
for a, b in edges:
if not union(a, b):
return True
return False
👉 Interview Answer
In an undirected graph, Union Find can detect cycles.
For each edge, if the two endpoints already have the same root, then they are already connected.
Adding this edge would create a cycle.
Otherwise, I union the two components.
1️⃣4️⃣ Redundant Connection
Problem
Given a graph that started as a tree, then one extra edge was added.
Return the edge that creates a cycle.
Code
def find_redundant_connection(edges):
n = len(edges)
parent = list(range(n + 1))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
parent[root_b] = root_a
return True
for a, b in edges:
if not union(a, b):
return [a, b]
return []
👉 Interview Answer
Redundant Connection is a cycle detection problem in an undirected graph.
I process edges one by one.
If an edge connects two nodes that are already connected, that edge creates a cycle and is the redundant edge.
1️⃣5️⃣ Accounts Merge
Problem
Each account has a name and emails. If two accounts share an email, they belong to the same person.
Union Find Idea
Treat each email as a node.
For every account, union all emails in that account.
Then group emails by root.
Code
from collections import defaultdict
def accounts_merge(accounts):
parent = {}
email_to_name = {}
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
for account in accounts:
name = account[0]
for email in account[1:]:
if email not in parent:
parent[email] = email
email_to_name[email] = name
first_email = account[1]
for email in account[2:]:
union(first_email, email)
groups = defaultdict(list)
for email in parent:
root = find(email)
groups[root].append(email)
result = []
for root, emails in groups.items():
name = email_to_name[root]
result.append([name] + sorted(emails))
return result
👉 Interview Answer
For Accounts Merge, I model each email as a node.
Emails in the same account are unioned together.
After processing all accounts, emails with the same root belong to the same person.
Then I group emails by root and sort them.
1️⃣6️⃣ Satisfiability of Equality Equations
Problem
Given equations like:
a == b
b != c
Return whether all equations can be satisfied.
Union Find Idea
First process all equality equations:
a == b
Union those variables.
Then process inequality equations:
a != b
If two variables in an inequality have the same root, there is a contradiction.
Code
def equations_possible(equations):
parent = list(range(26))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
for eq in equations:
if eq[1:3] == "==":
a = ord(eq[0]) - ord("a")
b = ord(eq[3]) - ord("a")
union(a, b)
for eq in equations:
if eq[1:3] == "!=":
a = ord(eq[0]) - ord("a")
b = ord(eq[3]) - ord("a")
if find(a) == find(b):
return False
return True
👉 Interview Answer
For equality equations, I union variables that must be equal.
After all equalities are processed, I check inequalities.
If two variables that must be unequal have the same root, the equations are contradictory.
1️⃣7️⃣ Number of Islands II
Problem
Given an initially water grid, land cells are added one by one.
After each addition, return the number of islands.
Why Union Find?
This is dynamic connectivity.
DFS would be expensive after every addition.
Union Find can merge new land with neighboring islands efficiently.
Code
def num_islands2(m, n, positions):
parent = {}
rank = {}
count = 0
result = []
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
nonlocal count
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
count -= 1
directions = [(1, 0), (-1, 0), (0, 1), (0, -1)]
for r, c in positions:
cell = (r, c)
if cell in parent:
result.append(count)
continue
parent[cell] = cell
rank[cell] = 0
count += 1
for dr, dc in directions:
nr = r + dr
nc = c + dc
neighbor = (nr, nc)
if 0 <= nr < m and 0 <= nc < n and neighbor in parent:
union(cell, neighbor)
result.append(count)
return result
👉 Interview Answer
Number of Islands II is a dynamic connectivity problem.
Each new land cell starts as a new island.
Then I check its four neighbors.
If a neighbor is already land, I union the two cells.
Every successful union reduces the island count by one.
1️⃣8️⃣ Kruskal’s Algorithm
Problem
Find a minimum spanning tree of a weighted undirected graph.
Why Union Find?
Kruskal sorts edges by weight.
Then it adds the smallest edge that does not create a cycle.
Union Find checks whether two endpoints are already connected.
Code Skeleton
def kruskal(n, edges):
parent = list(range(n))
rank = [0] * n
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
return True
edges.sort(key=lambda x: x[2])
total_weight = 0
edge_count = 0
for a, b, weight in edges:
if union(a, b):
total_weight += weight
edge_count += 1
if edge_count == n - 1:
break
return total_weight
👉 Interview Answer
Kruskal’s algorithm uses Union Find to build a minimum spanning tree.
We sort edges by weight.
Then we add the smallest edge that connects two different components.
If the endpoints already have the same root, adding that edge would create a cycle, so we skip it.
1️⃣9️⃣ Union Find vs DFS / BFS
DFS / BFS
Good for:
static graph traversal
finding components once
exploring all nodes in a component
grid connected regions
Union Find
Good for:
many connectivity queries
dynamic edge additions
merging components
cycle detection while processing edges
Comparison
| Pattern | Best For |
|---|---|
| DFS / BFS | Traverse graph or grid once |
| Union Find | Dynamic connectivity and repeated connectivity checks |
👉 Interview Answer
DFS and BFS are good when we need to traverse a graph or find components once.
Union Find is better when we repeatedly merge components or answer connectivity queries.
It is especially useful when edges are processed incrementally.
2️⃣0️⃣ Union Find Limitations
Good At
Union Find is good at:
adding connections
checking whether two nodes are connected
counting components
detecting cycles in undirected graphs
Not Good At
Union Find is not good at:
removing edges
finding actual paths
handling directed graph reachability
shortest path
traversing all nodes in a component
👉 Interview Answer
Union Find is excellent for connectivity under edge additions.
But it does not naturally support edge deletion, shortest paths, directed reachability, or retrieving the actual path between nodes.
For those problems, DFS, BFS, or shortest-path algorithms are usually better.
2️⃣1️⃣ Common Edge Cases
Empty Edges
edges = []
Each node is its own component.
Duplicate Edges
[0, 1]
[0, 1]
The second union may return false.
Self Loop
[0, 0]
Usually means already connected.
May indicate cycle depending on problem.
1-Indexed Nodes
Many graph problems use nodes:
1 to n
Need parent size:
parent = list(range(n + 1))
Dynamic Nodes
For emails or grid cells, nodes may not be integers.
Use dictionary parent map.
👉 Interview Answer
Common Union Find edge cases include empty edges, duplicate edges, self-loops, 1-indexed nodes, and dynamically discovered nodes such as emails or grid cells.
I choose parent array or parent dictionary depending on how nodes are represented.
2️⃣2️⃣ Common Mistakes
Mistake 1: Not Calling Find Before Union
Wrong:
parent[b] = a
Correct:
parent[find(b)] = find(a)
Mistake 2: Forgetting Path Compression
This can make find slow.
Mistake 3: Wrong Component Count
Only decrease component count when union actually merges two different roots.
Mistake 4: 1-Indexed Node Bug
Using size n array for nodes 1 to n causes index errors.
Mistake 5: Using Union Find for Directed Reachability
Union Find does not preserve direction.
Mistake 6: Expecting Actual Path
Union Find tells whether connected, but not which path connects them.
👉 Interview Answer
Common Union Find mistakes include unioning raw nodes instead of roots, forgetting path compression, decreasing component count even when union fails, mishandling 1-indexed nodes, using Union Find for directed reachability, and expecting it to return actual paths.
Union Find answers connectivity, not path reconstruction.
2️⃣3️⃣ When Union Find Applies
Good Fit
Use Union Find when the problem asks for:
connected components
are two nodes connected
merge groups
detect cycle in undirected graph
dynamic connectivity with added edges
number of groups after unions
minimum spanning tree with Kruskal
Problem Signals
Look for words like:
connected
component
group
merge
same set
friend circle
province
redundant connection
island added
union
equivalent
👉 Interview Answer
I consider Union Find when the problem involves merging groups or checking whether two items belong to the same group.
Common signals include connected components, provinces, redundant connection, accounts merge, equivalence relations, and dynamic island counting.
2️⃣4️⃣ When Union Find Does Not Apply
Not Good Fit
Union Find may not be ideal when:
need shortest path
need actual path
need directed reachability
need remove edges dynamically
need traverse all nodes in a component
need topological ordering
Better Alternatives
| Problem Type | Better Pattern |
|---|---|
| Shortest path unweighted | BFS |
| Weighted shortest path | Dijkstra |
| Directed dependency order | Topological Sort |
| Find actual path | DFS / BFS with parent |
| Connected regions in static grid | DFS / BFS |
| Edge deletion dynamic graph | Advanced dynamic connectivity |
👉 Interview Answer
Union Find is not a general graph traversal tool.
It is not suitable for shortest path, directed reachability, topological ordering, or finding actual paths.
It is best for undirected connectivity and component merging.
2️⃣5️⃣ Standard Templates
Basic Union Find Template
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
self.parent[root_b] = root_a
return True
Union Find With Rank Template
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.count = n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.rank[root_a] < self.rank[root_b]:
self.parent[root_a] = root_b
elif self.rank[root_a] > self.rank[root_b]:
self.parent[root_b] = root_a
else:
self.parent[root_b] = root_a
self.rank[root_a] += 1
self.count -= 1
return True
Union Find With Size Template
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.size = [1] * n
self.count = n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
self.count -= 1
return True
Dictionary-Based Union Find Template
class UnionFind:
def __init__(self):
self.parent = {}
self.size = {}
def add(self, x):
if x not in self.parent:
self.parent[x] = x
self.size[x] = 1
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
self.add(a)
self.add(b)
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
return True
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
Union Find, also called Disjoint Set Union, is a data structure for managing disjoint groups and answering connectivity questions.
It supports two core operations. Find returns the representative root of an element’s set. Union merges two sets together.
The basic idea is that every component has a representative root. If two nodes have the same root, they belong to the same connected component.
The parent array stores the forest structure. Initially, parent[i] equals i, meaning every node starts as its own component.
The find operation follows parent pointers until it reaches the root. With path compression, every node on the find path is updated to point directly to the root. This flattens the tree and makes future find operations faster.
The union operation first finds both roots. If the roots are already the same, the nodes are already connected, and union returns false. Otherwise, it attaches one root to the other and merges the two components.
To keep trees shallow, we use union by rank or union by size. Union by rank attaches the shorter tree under the taller tree. Union by size attaches the smaller component under the larger component.
With path compression and union by rank or size, Union Find operations are almost constant time, specifically O(α(n)) amortized, where α is the inverse Ackermann function.
Union Find is useful for connected components, cycle detection in undirected graphs, redundant connection, number of provinces, accounts merge, equality equations, dynamic island counting, and Kruskal’s minimum spanning tree algorithm.
For connected component counting, I start with n components. Every successful union decreases the count by one.
For cycle detection in an undirected graph, if an edge connects two nodes that already have the same root, adding that edge creates a cycle.
For dynamic problems like Number of Islands II, each new land cell starts as a new component. Then we union it with neighboring land cells, reducing the island count for every successful merge.
Union Find is not a general graph traversal algorithm. It does not give shortest paths, actual paths, directed reachability, or topological order. It is best for undirected connectivity and merging groups.
At a senior level, I would explain: what each node represents, what it means for two nodes to be in the same set, how find and union work, how component count changes, why path compression and union by rank make it efficient, and why Union Find is appropriate compared with DFS or BFS.
⭐ Final Insight
Union Find 的核心不是“写 parent array”。
它的核心是动态维护 connected components。
Senior-level 的重点是:
node 代表什么, root 代表什么, find 如何找到 component representative, union 什么时候真正 merge, component count 什么时候减少, path compression 为什么有效, 以及 Union Find 适合 connectivity 但不适合 shortest path 或 directed reachability。
能讲清楚这些, Union Find 就是一种处理 connected components、cycle detection、dynamic connectivity 和 group merging 的核心数据结构。
中文部分
🎯 Union Find
1️⃣ 核心框架
讨论 Union Find 时,我通常从:
- Union Find 解决什么问题
- Connected components
- Parent array
- Find operation
- Union operation
- Path compression
- Union by rank / size
- Cycle detection
- Graph connectivity problems
- 高级工程师解释方式
2️⃣ 要解决什么问题?
Union Find,也叫 Disjoint Set Union, 用于维护 connected groups。
它常用于:
- Connected components
- Graph connectivity
- Cycle detection in undirected graphs
- Number of provinces
- Redundant connection
- Accounts merge
- Number of islands II
- Kruskal’s algorithm
- Dynamic connectivity
- Friend circles
- Similar string groups
- Satisfiability of equality equations
核心思想是:
每个 element 属于一个 set。
Union 连接两个 set。
Find 查询一个 element 属于哪个 set。
👉 面试回答
Union Find 是一种维护 disjoint sets 的数据结构。
它支持两个核心操作: find 用来找到某个元素所在 set 的 representative, union 用来合并两个 sets。
它特别适合 connectivity、connected components, 以及 undirected graph 中的 cycle detection。
3️⃣ 为什么 Union Find 有效?
Connectivity Problem
假设有 nodes:
0, 1, 2, 3, 4
Edges:
0 - 1
1 - 2
3 - 4
Connected components 是:
{0, 1, 2}
{3, 4}
Union Find 可以高效维护这些 groups。
Representative
每个 group 有一个 representative root。
Example:
parent[0] = 0
parent[1] = 0
parent[2] = 0
表示:
0, 1, 2 belong to the same set
👉 面试回答
Union Find 的核心是: 每个 connected component 都有一个 representative root。
如果两个 nodes 的 root 相同, 它们就在同一个 component 中。
当一条 edge 连接两个不同 components 时, union 会把它们合并。
4️⃣ Parent Array
Basic Idea
Union Find 通常使用 parent array。
初始状态:
parent[i] = i
表示每个 node 自己就是一个 component。
Example
parent = [0, 1, 2, 3, 4]
Union 0 和 1 后:
parent = [0, 0, 2, 3, 4]
现在:
0 and 1 are in the same set
👉 面试回答
Parent array 存储每个 node 的 parent。
初始时,每个 node 的 parent 是自己, 表示每个 node 是单独 component。
随着 union 操作, 不同 components 会通过改变 parent pointer 被合并。
5️⃣ Find Operation
Purpose
find(x) 返回 x 所在 set 的 root representative。
如果:
find(a) == find(b)
说明:
a and b are connected
Basic Code
def find(x):
while parent[x] != x:
x = parent[x]
return x
Problem
如果不优化, tree 可能变得很高。
Worst case:
0 <- 1 <- 2 <- 3 <- 4
此时 find 可能退化为 O(n)。
👉 面试回答
Find operation 用来返回一个 node 的 root representative。
如果两个 nodes 的 root 相同, 它们就是 connected。
基础 find 会沿着 parent pointer 一直走到 root。
6️⃣ Path Compression
Core Idea
Path compression 会在 find 的过程中压平 tree。
查找 x 的 root 时, 把路径上的 nodes 都直接指向 root。
Code
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
Why It Helps
压缩前:
4 → 3 → 2 → 1 → 0
执行 find(4) 后:
4 → 0
3 → 0
2 → 0
1 → 0
之后的 find 会非常快。
👉 面试回答
Path compression 通过让路径上的 nodes 直接指向 root, 来优化 find。
这样 tree 会随着操作逐渐变平。
之后对同一个 component 的 find 操作会接近 O(1)。
7️⃣ Union Operation
Purpose
union(a, b) 合并 a 和 b 所在的 sets。
Steps:
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
Basic Code
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
parent[root_b] = root_a
return True
Return Value
返回 False 在这些题中很有用:
a and b are already connected
这通常意味着:
adding this edge creates a cycle
👉 面试回答
Union operation 用来合并两个 sets。
我会先找到两个 nodes 的 roots。
如果 roots 相同, 说明它们已经 connected。
否则,把一个 root 挂到另一个 root 下, 合并两个 components。
8️⃣ Union by Rank
Problem
如果随意 attach root, tree 可能变高。
Union by rank 用来保持 tree shallow。
Rank Meaning
rank[root] = tree 的 approximate height
把 rank 小的 tree 挂到 rank 大的 tree 下面。
Code
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.rank[root_a] < self.rank[root_b]:
self.parent[root_a] = root_b
elif self.rank[root_a] > self.rank[root_b]:
self.parent[root_b] = root_a
else:
self.parent[root_b] = root_a
self.rank[root_a] += 1
return True
👉 面试回答
Union by rank 用来保持 Union Find 的 tree shallow。
我把 rank 更小的 root 挂到 rank 更大的 root 下。
如果两个 rank 相同, 选择其中一个作为 root, 并把它的 rank 增加 1。
配合 path compression, Union Find 操作接近 O(1)。
9️⃣ Union by Size
Alternative to Rank
也可以不用 rank, 而是记录 component size。
size[root] = component 中 node 数量
把小 component 挂到大 component 下面。
Code
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.size = [1] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
return True
👉 面试回答
Union by size 会把较小 component 挂到较大 component 下面。
这样也可以保持 tree shallow。
如果题目还需要 component size, union by size 通常更方便。
🔟 Complexity Analysis
Without Optimization
Find worst case 可能是:
O(n)
With Path Compression and Union by Rank / Size
每个操作几乎是 constant time:
O(α(n))
其中:
α(n) = inverse Ackermann function
实践中:
almost O(1)
For m Operations
Time: O(m α(n))
Space: O(n)
👉 面试回答
使用 path compression 和 union by rank 或 size 后, Union Find 的操作接近 constant time。
严格来说, amortized time 是 O(α(n)), α 是 inverse Ackermann function。
实际工程和面试中, 基本可以理解为接近 O(1)。
1️⃣1️⃣ Connected Components
Problem
给定 n 个 nodes 和 edges, 返回 connected components 数量。
Idea
初始:
components = n
每次 successful union:
components -= 1
Code
def count_components(n, edges):
parent = list(range(n))
rank = [0] * n
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
return True
components = n
for a, b in edges:
if union(a, b):
components -= 1
return components
👉 面试回答
统计 connected components 时, 我一开始把每个 node 当成一个 component, 所以 count = n。
每处理一条 edge, 如果 union 成功合并两个不同 components, count 就减一。
最终 count 就是 connected components 数量。
1️⃣2️⃣ Number of Provinces
Problem
给定 adjacency matrix, 返回 connected groups 数量。
Code
def find_circle_num(is_connected):
n = len(is_connected)
parent = list(range(n))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
for i in range(n):
for j in range(i + 1, n):
if is_connected[i][j] == 1:
union(i, j)
roots = set()
for i in range(n):
roots.add(find(i))
return len(roots)
👉 面试回答
Number of Provinces 是 connected components 问题。
每个 city 是一个 node。
如果两个 city connected, 就 union 它们。
最后 unique roots 的数量就是 provinces 数量。
1️⃣3️⃣ Undirected Graph Cycle Detection
Problem
给定 undirected edges, 判断是否出现 cycle。
Key Insight
如果 edge (a, b) 连接的两个 nodes 已经在同一个 component 中:
cycle exists
Code
def has_cycle(n, edges):
parent = list(range(n))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
parent[root_b] = root_a
return True
for a, b in edges:
if not union(a, b):
return True
return False
👉 面试回答
在 undirected graph 中, Union Find 可以用来检测 cycle。
对每条 edge, 如果两个 endpoints 已经有相同 root, 说明它们已经 connected。
此时再加入这条 edge 就会形成 cycle。
否则就 union 两个 components。
1️⃣4️⃣ Redundant Connection
Problem
一个 graph 原本是 tree, 后来多加了一条 edge。
返回造成 cycle 的 edge。
Code
def find_redundant_connection(edges):
n = len(edges)
parent = list(range(n + 1))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
parent[root_b] = root_a
return True
for a, b in edges:
if not union(a, b):
return [a, b]
return []
👉 面试回答
Redundant Connection 是 undirected graph 的 cycle detection。
我按顺序处理 edges。
如果某条 edge 的两个 endpoints 已经 connected, 那么这条 edge 会造成 cycle, 它就是 redundant edge。
1️⃣5️⃣ Accounts Merge
Problem
每个 account 有 name 和 emails。 如果两个 accounts 共享 email, 它们属于同一个 person。
Union Find Idea
把每个 email 当成 node。
对每个 account, 把其中所有 emails union 到一起。
最后按照 root 聚合 emails。
Code
from collections import defaultdict
def accounts_merge(accounts):
parent = {}
email_to_name = {}
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
for account in accounts:
name = account[0]
for email in account[1:]:
if email not in parent:
parent[email] = email
email_to_name[email] = name
first_email = account[1]
for email in account[2:]:
union(first_email, email)
groups = defaultdict(list)
for email in parent:
root = find(email)
groups[root].append(email)
result = []
for root, emails in groups.items():
name = email_to_name[root]
result.append([name] + sorted(emails))
return result
👉 面试回答
对 Accounts Merge, 我把每个 email 当成一个 node。
同一个 account 中的 emails 应该属于同一个 person, 所以把它们 union。
处理完所有 accounts 后, 拥有相同 root 的 emails 属于同一个 group。
最后按 root 分组并排序 emails。
1️⃣6️⃣ Satisfiability of Equality Equations
Problem
给定 equations:
a == b
b != c
判断所有 equations 是否可以同时满足。
Union Find Idea
先处理所有 equality:
a == b
把这些 variables union。
然后处理 inequality:
a != b
如果两个不应该相等的 variables 有相同 root, 就 contradiction。
Code
def equations_possible(equations):
parent = list(range(26))
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
for eq in equations:
if eq[1:3] == "==":
a = ord(eq[0]) - ord("a")
b = ord(eq[3]) - ord("a")
union(a, b)
for eq in equations:
if eq[1:3] == "!=":
a = ord(eq[0]) - ord("a")
b = ord(eq[3]) - ord("a")
if find(a) == find(b):
return False
return True
👉 面试回答
对 equality equations, 我先把必须相等的 variables union 到同一个 set。
然后检查 inequality equations。
如果两个必须不相等的 variables 有相同 root, 就说明出现 contradiction。
否则所有 equations 可以满足。
1️⃣7️⃣ Number of Islands II
Problem
给定一个初始全是 water 的 grid, land cells 会一个一个被加入。
每次加入后, 返回当前 islands 数量。
Why Union Find?
这是 dynamic connectivity。
如果每次都 DFS 重新统计, 会很贵。
Union Find 可以在新增 land 时, 只和周围 land 合并。
Code
def num_islands2(m, n, positions):
parent = {}
rank = {}
count = 0
result = []
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
nonlocal count
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
count -= 1
directions = [(1, 0), (-1, 0), (0, 1), (0, -1)]
for r, c in positions:
cell = (r, c)
if cell in parent:
result.append(count)
continue
parent[cell] = cell
rank[cell] = 0
count += 1
for dr, dc in directions:
nr = r + dr
nc = c + dc
neighbor = (nr, nc)
if 0 <= nr < m and 0 <= nc < n and neighbor in parent:
union(cell, neighbor)
result.append(count)
return result
👉 面试回答
Number of Islands II 是 dynamic connectivity 问题。
每个新 land cell 一开始是一个新的 island。
然后检查它上下左右四个 neighbor。
如果 neighbor 已经是 land, 就把两个 cells union。
每次 successful union 都会让 island count 减一。
1️⃣8️⃣ Kruskal’s Algorithm
Problem
在 weighted undirected graph 中, 找到 minimum spanning tree。
Why Union Find?
Kruskal 会按 weight 从小到大处理 edges。
每次加入不会形成 cycle 的最小 edge。
Union Find 用来判断两个 endpoints 是否已经 connected。
Code Skeleton
def kruskal(n, edges):
parent = list(range(n))
rank = [0] * n
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return False
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
return True
edges.sort(key=lambda x: x[2])
total_weight = 0
edge_count = 0
for a, b, weight in edges:
if union(a, b):
total_weight += weight
edge_count += 1
if edge_count == n - 1:
break
return total_weight
👉 面试回答
Kruskal’s algorithm 使用 Union Find 构建 minimum spanning tree。
我们先按 edge weight 排序。
然后依次选择最小 edge。
如果这条 edge 连接两个不同 components, 就加入 MST。
如果 endpoints 已经 connected, 加入这条 edge 会形成 cycle, 所以跳过。
1️⃣9️⃣ Union Find vs DFS / BFS
DFS / BFS
适合:
static graph traversal
finding components once
exploring all nodes in a component
grid connected regions
Union Find
适合:
many connectivity queries
dynamic edge additions
merging components
cycle detection while processing edges
Comparison
| Pattern | Best For |
|---|---|
| DFS / BFS | Traverse graph or grid once |
| Union Find | Dynamic connectivity and repeated connectivity checks |
👉 面试回答
DFS 和 BFS 适合一次性 traverse graph, 或一次性找 connected components。
Union Find 更适合反复 merge components, 或者反复查询两个 nodes 是否 connected。
特别是 edges incrementally added 的场景, Union Find 通常更自然。
2️⃣0️⃣ Union Find Limitations
Good At
Union Find 擅长:
adding connections
checking whether two nodes are connected
counting components
detecting cycles in undirected graphs
Not Good At
Union Find 不适合:
removing edges
finding actual paths
handling directed graph reachability
shortest path
traversing all nodes in a component
👉 面试回答
Union Find 非常适合 edge additions 下的 connectivity。
但它不擅长 edge deletion、shortest path、directed reachability, 或者返回两个 nodes 之间的 actual path。
这些问题通常要用 DFS、BFS 或 shortest-path algorithms。
2️⃣1️⃣ 常见 Edge Cases
Empty Edges
edges = []
每个 node 自己是一个 component。
Duplicate Edges
[0, 1]
[0, 1]
第二次 union 可能返回 false。
Self Loop
[0, 0]
通常表示 already connected。
是否算 cycle 取决于题目。
1-Indexed Nodes
很多 graph problems 的 nodes 是:
1 to n
需要 parent size:
parent = list(range(n + 1))
Dynamic Nodes
对于 emails 或 grid cells, nodes 可能不是整数。
使用 dictionary parent map。
👉 面试回答
Union Find 常见 edge cases 包括 empty edges、 duplicate edges、 self-loop、 1-indexed nodes, 以及动态出现的 nodes,比如 emails 或 grid cells。
如果 nodes 是整数且范围固定,用 parent array。
如果 nodes 动态出现,用 dictionary parent map。
2️⃣2️⃣ 常见错误
Mistake 1: Union 时没有先 Find Root
错误:
parent[b] = a
正确:
parent[find(b)] = find(a)
Mistake 2: 忘记 Path Compression
会导致 find 变慢。
Mistake 3: Component Count 更新错误
只有 union 真正合并两个不同 roots 时, component count 才应该减少。
Mistake 4: 1-Indexed Node Bug
如果 nodes 是 1 到 n, parent array 要开 n + 1。
Mistake 5: 用 Union Find 处理 Directed Reachability
Union Find 不保留方向。
Mistake 6: 期待 Union Find 返回 Actual Path
Union Find 只告诉你 connected 与否, 不告诉你怎么 connected。
👉 面试回答
Union Find 常见错误包括: 没有 union roots 而是 union raw nodes、 忘记 path compression、 union 失败时仍然减少 component count、 1-indexed nodes 数组大小错误、 用 Union Find 处理 directed reachability, 以及期待它返回 actual path。
Union Find 回答的是 connectivity, 不是 path reconstruction。
2️⃣3️⃣ 什么时候 Union Find 适用?
Good Fit
当题目问:
connected components
are two nodes connected
merge groups
detect cycle in undirected graph
dynamic connectivity with added edges
number of groups after unions
minimum spanning tree with Kruskal
Problem Signals
看到这些关键词想到 Union Find:
connected
component
group
merge
same set
friend circle
province
redundant connection
island added
union
equivalent
👉 面试回答
当问题涉及 merge groups, 或判断两个 items 是否属于同一个 group 时, 我会考虑 Union Find。
常见信号包括 connected components、provinces、 redundant connection、accounts merge、equivalence relations, 以及动态 island counting。
2️⃣4️⃣ 什么时候 Union Find 不适用?
Not Good Fit
Union Find 不适合:
need shortest path
need actual path
need directed reachability
need remove edges dynamically
need traverse all nodes in a component
need topological ordering
Better Alternatives
| Problem Type | Better Pattern |
|---|---|
| Shortest path unweighted | BFS |
| Weighted shortest path | Dijkstra |
| Directed dependency order | Topological Sort |
| Find actual path | DFS / BFS with parent |
| Connected regions in static grid | DFS / BFS |
| Edge deletion dynamic graph | Advanced dynamic connectivity |
👉 面试回答
Union Find 不是通用 graph traversal 工具。
它不适合 shortest path、directed reachability、 topological ordering, 或者寻找 actual path。
它最适合 undirected connectivity 和 component merging。
2️⃣5️⃣ 标准模板
Basic Union Find Template
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
self.parent[root_b] = root_a
return True
Union Find With Rank Template
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.count = n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.rank[root_a] < self.rank[root_b]:
self.parent[root_a] = root_b
elif self.rank[root_a] > self.rank[root_b]:
self.parent[root_b] = root_a
else:
self.parent[root_b] = root_a
self.rank[root_a] += 1
self.count -= 1
return True
Union Find With Size Template
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.size = [1] * n
self.count = n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
self.count -= 1
return True
Dictionary-Based Union Find Template
class UnionFind:
def __init__(self):
self.parent = {}
self.size = {}
def add(self, x):
if x not in self.parent:
self.parent[x] = x
self.size[x] = 1
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a, b):
self.add(a)
self.add(b)
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return False
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
return True
🧠 Staff-Level Answer Final
👉 面试回答完整版本
Union Find,也叫 Disjoint Set Union, 是一种用于管理 disjoint groups 和回答 connectivity questions 的数据结构。
它支持两个核心操作。 Find 返回一个元素所在 set 的 representative root。 Union 合并两个 sets。
基本思想是: 每个 connected component 都有一个代表 root。 如果两个 nodes 的 root 相同, 它们就在同一个 connected component 中。
Parent array 存储 forest structure。 初始时 parent[i] = i, 表示每个 node 自己是一个 component。
Find 操作沿着 parent pointers 找到 root。 加上 path compression 后, find 会把路径上的所有 nodes 直接指向 root。 这样 tree 会变平, 之后的 find 会更快。
Union 操作会先找到两个 roots。 如果 roots 已经相同, 说明两个 nodes 已经 connected, union 返回 false。 否则, 把一个 root 挂到另一个 root 下, 合并两个 components。
为了保持 tree shallow, 通常使用 union by rank 或 union by size。 Union by rank 把较矮的 tree 挂到较高的 tree 下。 Union by size 把较小的 component 挂到较大的 component 下。
使用 path compression 加 union by rank 或 size 后, Union Find 操作几乎是 constant time, 更严格地说是 amortized O(α(n))。
Union Find 常用于 connected components、undirected graph cycle detection、 redundant connection、number of provinces、accounts merge、 equality equations、dynamic island counting, 以及 Kruskal minimum spanning tree。
对 connected components counting, 我从 n 个 components 开始。 每次 successful union, component count 减一。
对 undirected graph cycle detection, 如果一条 edge 的两个 endpoints 已经有相同 root, 加入这条 edge 就会形成 cycle。
对 Number of Islands II 这种 dynamic problem, 每个新增 land cell 一开始是新的 component。 然后和 neighboring lands union。 每次 successful merge 都减少 island count。
Union Find 不是通用 graph traversal algorithm。 它不能提供 shortest path、actual path、directed reachability 或 topological order。 它最适合 undirected connectivity 和 group merging。
高级工程师解释 Union Find 时, 我会讲清楚: 每个 node 代表什么, 两个 nodes 在 same set 中意味着什么, find 和 union 如何工作, component count 如何变化, path compression 和 union by rank 为什么高效, 以及为什么这里 Union Find 比 DFS 或 BFS 更合适。
⭐ Final Insight
Union Find 的核心不是“parent array 模板”。
它的核心是动态维护 connected components。
Senior-level 的重点是:
node 代表什么, root 代表什么, find 如何找到 component representative, union 什么时候真正 merge, component count 什么时候减少, path compression 为什么有效, 以及 Union Find 适合 connectivity, 但不适合 shortest path 或 directed reachability。
能讲清楚这些, Union Find 就是一种处理 connected components、cycle detection、dynamic connectivity 和 group merging 的核心数据结构。
Implement