LeetCode Patterns ·

🎯 Union Find

1️⃣ Core Framework

When discussing Union Find, I frame it as:

What problem pattern Union Find solves
Connected components
Parent array
Find operation
Union operation
Path compression
Union by rank / size
Cycle detection
Graph connectivity problems
Senior-level explanation

2️⃣ What Problem Are We Solving?

Union Find, also called Disjoint Set Union, is used to manage groups of connected elements.

It is commonly used for:

Connected components
Graph connectivity
Cycle detection in undirected graphs
Number of provinces
Redundant connection
Accounts merge
Number of islands II
Kruskal’s algorithm
Dynamic connectivity
Friend circles
Similar string groups
Satisfiability of equality equations

The core idea is:

Each element belongs to a set.
Union connects two sets.
Find tells which set an element belongs to.

👉 Interview Answer

Union Find is a data structure for tracking disjoint sets.

It supports two main operations: find, which returns the representative of a set, and union, which merges two sets.

It is especially useful for connectivity problems, connected components, and cycle detection in undirected graphs.

3️⃣ Why Union Find Works

Connectivity Problem

Suppose we have nodes:

0, 1, 2, 3, 4

And edges:

- 1
- 2
- 4

Then the connected components are:

{0, 1, 2}
{3, 4}

Union Find helps us maintain this grouping efficiently.

Representative

Each group has a representative root.

Example:

parent[0] = 0
parent[1] = 0
parent[2] = 0

This means:

0, 1, 2 belong to the same set

👉 Interview Answer

Union Find works by assigning each connected component a representative root.

If two nodes have the same root, they belong to the same component.

When an edge connects two different components, union merges those components into one.

4️⃣ Parent Array

Basic Idea

Union Find usually uses a parent array.

Initially:

parent[i] = i

This means every node is its own component.

Example

parent = [0, 1, 2, 3, 4]

After union 0 and 1:

parent = [0, 0, 2, 3, 4]

Now:

0 and 1 are in the same set

👉 Interview Answer

The parent array stores the parent of each node in the Union Find forest.

Initially, every node is its own parent, meaning every node starts as a separate component.

As we union nodes, components are merged by changing parent pointers.

5️⃣ Find Operation

Purpose

find(x) returns the root representative of x.

If:

find(a) == find(b)

Then:

a and b are connected

Basic Code

def find(x):
    while parent[x] != x:
        x = parent[x]

    return x

Problem

Without optimization, the tree can become tall.

Worst case:

0 <- 1 <- 2 <- 3 <- 4

Then find can become O(n).

👉 Interview Answer

The find operation returns the root representative of a node.

Two nodes are connected if their roots are the same.

A basic find follows parent pointers until reaching a node whose parent is itself.

6️⃣ Path Compression

Core Idea

Path compression flattens the tree during find.

When finding the root of x, we make every node on the path point directly to the root.

Code

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])

    return parent[x]

Why It Helps

Before compression:

4 → 3 → 2 → 1 → 0

After find(4):

→ 0
→ 0
→ 0
→ 0

Future find operations become much faster.

👉 Interview Answer

Path compression optimizes find by making nodes point directly to the root.

This flattens the tree over time.

After path compression, future find operations on the same component become almost constant time.

7️⃣ Union Operation

Purpose

union(a, b) merges the sets containing a and b.

Steps:

root_a = find(a)
root_b = find(b)

if root_a != root_b:
    parent[root_b] = root_a

Basic Code

def union(a, b):
    root_a = find(a)
    root_b = find(b)

    if root_a == root_b:
        return False

    parent[root_b] = root_a
    return True

Return Value

Returning False is useful when:

a and b are already connected

This often means:

adding this edge creates a cycle

👉 Interview Answer

The union operation merges two sets.

First, I find the root of each node.

If they already have the same root, they are already connected.

Otherwise, I attach one root to the other root and merge the components.

8️⃣ Union by Rank

Problem

If we always attach one root arbitrarily, the tree may become tall.

Union by rank keeps the tree shallow.

Rank Meaning

rank[root] = approximate height of the tree

Attach the smaller-rank tree under the larger-rank tree.

Code

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.rank[root_a] < self.rank[root_b]:
            self.parent[root_a] = root_b
        elif self.rank[root_a] > self.rank[root_b]:
            self.parent[root_b] = root_a
        else:
            self.parent[root_b] = root_a
            self.rank[root_a] += 1

        return True

👉 Interview Answer

Union by rank keeps the tree shallow.

I attach the root with smaller rank under the root with larger rank.

If both ranks are equal, I choose one root and increase its rank.

Combined with path compression, Union Find operations are nearly constant time.

9️⃣ Union by Size

Alternative to Rank

Instead of rank, we can track component size.

size[root] = number of nodes in the component

Attach the smaller component under the larger component.

Code

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.size[root_a] < self.size[root_b]:
            root_a, root_b = root_b, root_a

        self.parent[root_b] = root_a
        self.size[root_a] += self.size[root_b]

        return True

👉 Interview Answer

Union by size attaches the smaller component under the larger component.

This keeps the Union Find tree shallow.

It is often easier to reason about than rank, especially when the problem also needs component sizes.

🔟 Complexity Analysis

Without Optimization

Find can be:

O(n)

in the worst case.

With Path Compression and Union by Rank / Size

Each operation is almost constant time:

O(α(n))

Where:

α(n) = inverse Ackermann function

In practice:

almost O(1)

For m Operations

Time: O(m α(n))
Space: O(n)

👉 Interview Answer

With path compression and union by rank or size, Union Find operations are almost constant time.

Technically the amortized time is O(α(n)), where α is the inverse Ackermann function.

In practice, it behaves like O(1).

1️⃣1️⃣ Connected Components

Problem

Given n nodes and edges, return the number of connected components.

Idea

Initially:

components = n

Every successful union reduces component count by 1.

Code

def count_components(n, edges):
    parent = list(range(n))
    rank = [0] * n

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        if rank[root_a] < rank[root_b]:
            parent[root_a] = root_b
        elif rank[root_a] > rank[root_b]:
            parent[root_b] = root_a
        else:
            parent[root_b] = root_a
            rank[root_a] += 1

        return True

    components = n

    for a, b in edges:
        if union(a, b):
            components -= 1

    return components

👉 Interview Answer

To count connected components, I start with n components because every node is separate initially.

For each edge, if union successfully merges two different components, I decrease the component count by one.

At the end, the count is the number of connected components.

1️⃣2️⃣ Number of Provinces

Problem

Given an adjacency matrix, return the number of connected groups.

Code

def find_circle_num(is_connected):
    n = len(is_connected)
    parent = list(range(n))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a != root_b:
            parent[root_b] = root_a

    for i in range(n):
        for j in range(i + 1, n):
            if is_connected[i][j] == 1:
                union(i, j)

    roots = set()

    for i in range(n):
        roots.add(find(i))

    return len(roots)

👉 Interview Answer

Number of Provinces is a connected-components problem.

Each city is a node, and a connection means two cities should be unioned.

After processing the matrix, the number of unique roots is the number of provinces.

1️⃣3️⃣ Cycle Detection in Undirected Graph

Problem

Given undirected edges, determine whether adding an edge creates a cycle.

Key Insight

If edge (a, b) connects two nodes already in the same component:

cycle exists

Code

def has_cycle(n, edges):
    parent = list(range(n))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        parent[root_b] = root_a
        return True

    for a, b in edges:
        if not union(a, b):
            return True

    return False

👉 Interview Answer

In an undirected graph, Union Find can detect cycles.

For each edge, if the two endpoints already have the same root, then they are already connected.

Adding this edge would create a cycle.

Otherwise, I union the two components.

1️⃣4️⃣ Redundant Connection

Problem

Given a graph that started as a tree, then one extra edge was added.

Return the edge that creates a cycle.

Code

def find_redundant_connection(edges):
    n = len(edges)
    parent = list(range(n + 1))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        parent[root_b] = root_a
        return True

    for a, b in edges:
        if not union(a, b):
            return [a, b]

    return []

👉 Interview Answer

Redundant Connection is a cycle detection problem in an undirected graph.

I process edges one by one.

If an edge connects two nodes that are already connected, that edge creates a cycle and is the redundant edge.

1️⃣5️⃣ Accounts Merge

Problem

Each account has a name and emails. If two accounts share an email, they belong to the same person.

Union Find Idea

Treat each email as a node.

For every account, union all emails in that account.

Then group emails by root.

Code

from collections import defaultdict

def accounts_merge(accounts):
    parent = {}
    email_to_name = {}

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a != root_b:
            parent[root_b] = root_a

    for account in accounts:
        name = account[0]

        for email in account[1:]:
            if email not in parent:
                parent[email] = email

            email_to_name[email] = name

        first_email = account[1]

        for email in account[2:]:
            union(first_email, email)

    groups = defaultdict(list)

    for email in parent:
        root = find(email)
        groups[root].append(email)

    result = []

    for root, emails in groups.items():
        name = email_to_name[root]
        result.append([name] + sorted(emails))

    return result

👉 Interview Answer

For Accounts Merge, I model each email as a node.

Emails in the same account are unioned together.

After processing all accounts, emails with the same root belong to the same person.

Then I group emails by root and sort them.

1️⃣6️⃣ Satisfiability of Equality Equations

Problem

Given equations like:

a == b
b != c

Return whether all equations can be satisfied.

Union Find Idea

First process all equality equations:

a == b

Union those variables.

Then process inequality equations:

a != b

If two variables in an inequality have the same root, there is a contradiction.

Code

def equations_possible(equations):
    parent = list(range(26))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a != root_b:
            parent[root_b] = root_a

    for eq in equations:
        if eq[1:3] == "==":
            a = ord(eq[0]) - ord("a")
            b = ord(eq[3]) - ord("a")
            union(a, b)

    for eq in equations:
        if eq[1:3] == "!=":
            a = ord(eq[0]) - ord("a")
            b = ord(eq[3]) - ord("a")

            if find(a) == find(b):
                return False

    return True

👉 Interview Answer

For equality equations, I union variables that must be equal.

After all equalities are processed, I check inequalities.

If two variables that must be unequal have the same root, the equations are contradictory.

1️⃣7️⃣ Number of Islands II

Problem

Given an initially water grid, land cells are added one by one.

After each addition, return the number of islands.

Why Union Find?

This is dynamic connectivity.

DFS would be expensive after every addition.

Union Find can merge new land with neighboring islands efficiently.

Code

def num_islands2(m, n, positions):
    parent = {}
    rank = {}
    count = 0
    result = []

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        nonlocal count

        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return

        if rank[root_a] < rank[root_b]:
            parent[root_a] = root_b
        elif rank[root_a] > rank[root_b]:
            parent[root_b] = root_a
        else:
            parent[root_b] = root_a
            rank[root_a] += 1

        count -= 1

    directions = [(1, 0), (-1, 0), (0, 1), (0, -1)]

    for r, c in positions:
        cell = (r, c)

        if cell in parent:
            result.append(count)
            continue

        parent[cell] = cell
        rank[cell] = 0
        count += 1

        for dr, dc in directions:
            nr = r + dr
            nc = c + dc
            neighbor = (nr, nc)

            if 0 <= nr < m and 0 <= nc < n and neighbor in parent:
                union(cell, neighbor)

        result.append(count)

    return result

👉 Interview Answer

Number of Islands II is a dynamic connectivity problem.

Each new land cell starts as a new island.

Then I check its four neighbors.

If a neighbor is already land, I union the two cells.

Every successful union reduces the island count by one.

1️⃣8️⃣ Kruskal’s Algorithm

Problem

Find a minimum spanning tree of a weighted undirected graph.

Why Union Find?

Kruskal sorts edges by weight.

Then it adds the smallest edge that does not create a cycle.

Union Find checks whether two endpoints are already connected.

Code Skeleton

def kruskal(n, edges):
    parent = list(range(n))
    rank = [0] * n

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        if rank[root_a] < rank[root_b]:
            parent[root_a] = root_b
        elif rank[root_a] > rank[root_b]:
            parent[root_b] = root_a
        else:
            parent[root_b] = root_a
            rank[root_a] += 1

        return True

    edges.sort(key=lambda x: x[2])

    total_weight = 0
    edge_count = 0

    for a, b, weight in edges:
        if union(a, b):
            total_weight += weight
            edge_count += 1

            if edge_count == n - 1:
                break

    return total_weight

👉 Interview Answer

Kruskal’s algorithm uses Union Find to build a minimum spanning tree.

We sort edges by weight.

Then we add the smallest edge that connects two different components.

If the endpoints already have the same root, adding that edge would create a cycle, so we skip it.

1️⃣9️⃣ Union Find vs DFS / BFS

DFS / BFS

Good for:

static graph traversal
finding components once
exploring all nodes in a component
grid connected regions

Union Find

Good for:

many connectivity queries
dynamic edge additions
merging components
cycle detection while processing edges

Comparison

Pattern	Best For
DFS / BFS	Traverse graph or grid once
Union Find	Dynamic connectivity and repeated connectivity checks

👉 Interview Answer

DFS and BFS are good when we need to traverse a graph or find components once.

Union Find is better when we repeatedly merge components or answer connectivity queries.

It is especially useful when edges are processed incrementally.

2️⃣0️⃣ Union Find Limitations

Good At

Union Find is good at:

adding connections
checking whether two nodes are connected
counting components
detecting cycles in undirected graphs

Not Good At

Union Find is not good at:

removing edges
finding actual paths
handling directed graph reachability
shortest path
traversing all nodes in a component

👉 Interview Answer

Union Find is excellent for connectivity under edge additions.

But it does not naturally support edge deletion, shortest paths, directed reachability, or retrieving the actual path between nodes.

For those problems, DFS, BFS, or shortest-path algorithms are usually better.

2️⃣1️⃣ Common Edge Cases

Empty Edges

edges = []

Each node is its own component.

Duplicate Edges

[0, 1]
[0, 1]

The second union may return false.

Self Loop

[0, 0]

Usually means already connected.

May indicate cycle depending on problem.

1-Indexed Nodes

Many graph problems use nodes:

1 to n

Need parent size:

parent = list(range(n + 1))

Dynamic Nodes

For emails or grid cells, nodes may not be integers.

Use dictionary parent map.

👉 Interview Answer

Common Union Find edge cases include empty edges, duplicate edges, self-loops, 1-indexed nodes, and dynamically discovered nodes such as emails or grid cells.

I choose parent array or parent dictionary depending on how nodes are represented.

2️⃣2️⃣ Common Mistakes

Mistake 1: Not Calling Find Before Union

Wrong:

parent[b] = a

Correct:

parent[find(b)] = find(a)

Mistake 2: Forgetting Path Compression

This can make find slow.

Mistake 3: Wrong Component Count

Only decrease component count when union actually merges two different roots.

Mistake 4: 1-Indexed Node Bug

Using size n array for nodes 1 to n causes index errors.

Mistake 5: Using Union Find for Directed Reachability

Union Find does not preserve direction.

Mistake 6: Expecting Actual Path

Union Find tells whether connected, but not which path connects them.

👉 Interview Answer

Common Union Find mistakes include unioning raw nodes instead of roots, forgetting path compression, decreasing component count even when union fails, mishandling 1-indexed nodes, using Union Find for directed reachability, and expecting it to return actual paths.

Union Find answers connectivity, not path reconstruction.

2️⃣3️⃣ When Union Find Applies

Good Fit

Use Union Find when the problem asks for:

connected components
are two nodes connected
merge groups
detect cycle in undirected graph
dynamic connectivity with added edges
number of groups after unions
minimum spanning tree with Kruskal

Problem Signals

Look for words like:

connected
component
group
merge
same set
friend circle
province
redundant connection
island added
union
equivalent

👉 Interview Answer

I consider Union Find when the problem involves merging groups or checking whether two items belong to the same group.

Common signals include connected components, provinces, redundant connection, accounts merge, equivalence relations, and dynamic island counting.

2️⃣4️⃣ When Union Find Does Not Apply

Not Good Fit

Union Find may not be ideal when:

need shortest path
need actual path
need directed reachability
need remove edges dynamically
need traverse all nodes in a component
need topological ordering

Better Alternatives

Problem Type	Better Pattern
Shortest path unweighted	BFS
Weighted shortest path	Dijkstra
Directed dependency order	Topological Sort
Find actual path	DFS / BFS with parent
Connected regions in static grid	DFS / BFS
Edge deletion dynamic graph	Advanced dynamic connectivity

👉 Interview Answer

Union Find is not a general graph traversal tool.

It is not suitable for shortest path, directed reachability, topological ordering, or finding actual paths.

It is best for undirected connectivity and component merging.

2️⃣5️⃣ Standard Templates

Basic Union Find Template

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        self.parent[root_b] = root_a
        return True

Union Find With Rank Template

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.rank[root_a] < self.rank[root_b]:
            self.parent[root_a] = root_b
        elif self.rank[root_a] > self.rank[root_b]:
            self.parent[root_b] = root_a
        else:
            self.parent[root_b] = root_a
            self.rank[root_a] += 1

        self.count -= 1
        return True

Union Find With Size Template

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n
        self.count = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.size[root_a] < self.size[root_b]:
            root_a, root_b = root_b, root_a

        self.parent[root_b] = root_a
        self.size[root_a] += self.size[root_b]
        self.count -= 1

        return True

Dictionary-Based Union Find Template

class UnionFind:
    def __init__(self):
        self.parent = {}
        self.size = {}

    def add(self, x):
        if x not in self.parent:
            self.parent[x] = x
            self.size[x] = 1

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        self.add(a)
        self.add(b)

        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.size[root_a] < self.size[root_b]:
            root_a, root_b = root_b, root_a

        self.parent[root_b] = root_a
        self.size[root_a] += self.size[root_b]

        return True

🧠 Staff-Level Answer Final

👉 Interview Answer Full Version

Union Find, also called Disjoint Set Union, is a data structure for managing disjoint groups and answering connectivity questions.

It supports two core operations. Find returns the representative root of an element’s set. Union merges two sets together.

The basic idea is that every component has a representative root. If two nodes have the same root, they belong to the same connected component.

The parent array stores the forest structure. Initially, parent[i] equals i, meaning every node starts as its own component.

The find operation follows parent pointers until it reaches the root. With path compression, every node on the find path is updated to point directly to the root. This flattens the tree and makes future find operations faster.

The union operation first finds both roots. If the roots are already the same, the nodes are already connected, and union returns false. Otherwise, it attaches one root to the other and merges the two components.

To keep trees shallow, we use union by rank or union by size. Union by rank attaches the shorter tree under the taller tree. Union by size attaches the smaller component under the larger component.

With path compression and union by rank or size, Union Find operations are almost constant time, specifically O(α(n)) amortized, where α is the inverse Ackermann function.

Union Find is useful for connected components, cycle detection in undirected graphs, redundant connection, number of provinces, accounts merge, equality equations, dynamic island counting, and Kruskal’s minimum spanning tree algorithm.

For connected component counting, I start with n components. Every successful union decreases the count by one.

For cycle detection in an undirected graph, if an edge connects two nodes that already have the same root, adding that edge creates a cycle.

For dynamic problems like Number of Islands II, each new land cell starts as a new component. Then we union it with neighboring land cells, reducing the island count for every successful merge.

Union Find is not a general graph traversal algorithm. It does not give shortest paths, actual paths, directed reachability, or topological order. It is best for undirected connectivity and merging groups.

At a senior level, I would explain: what each node represents, what it means for two nodes to be in the same set, how find and union work, how component count changes, why path compression and union by rank make it efficient, and why Union Find is appropriate compared with DFS or BFS.

⭐ Final Insight

Union Find 的核心不是“写 parent array”。

它的核心是动态维护 connected components。

Senior-level 的重点是：

node 代表什么， root 代表什么， find 如何找到 component representative， union 什么时候真正 merge， component count 什么时候减少， path compression 为什么有效，以及 Union Find 适合 connectivity 但不适合 shortest path 或 directed reachability。

能讲清楚这些， Union Find 就是一种处理 connected components、cycle detection、dynamic connectivity 和 group merging 的核心数据结构。

中文部分

🎯 Union Find

1️⃣ 核心框架

讨论 Union Find 时，我通常从：

Union Find 解决什么问题
Connected components
Parent array
Find operation
Union operation
Path compression
Union by rank / size
Cycle detection
Graph connectivity problems
高级工程师解释方式

2️⃣ 要解决什么问题？

Union Find，也叫 Disjoint Set Union，用于维护 connected groups。

它常用于：

Connected components
Graph connectivity
Cycle detection in undirected graphs
Number of provinces
Redundant connection
Accounts merge
Number of islands II
Kruskal’s algorithm
Dynamic connectivity
Friend circles
Similar string groups
Satisfiability of equality equations

核心思想是：

每个 element 属于一个 set。
Union 连接两个 set。
Find 查询一个 element 属于哪个 set。

👉 面试回答

Union Find 是一种维护 disjoint sets 的数据结构。

它支持两个核心操作： find 用来找到某个元素所在 set 的 representative， union 用来合并两个 sets。

它特别适合 connectivity、connected components，以及 undirected graph 中的 cycle detection。

3️⃣ 为什么 Union Find 有效？

Connectivity Problem

假设有 nodes：

0, 1, 2, 3, 4

Edges:

- 1
- 2
- 4

Connected components 是：

{0, 1, 2}
{3, 4}

Union Find 可以高效维护这些 groups。

Representative

每个 group 有一个 representative root。

Example:

parent[0] = 0
parent[1] = 0
parent[2] = 0

表示：

0, 1, 2 belong to the same set

👉 面试回答

Union Find 的核心是：每个 connected component 都有一个 representative root。

如果两个 nodes 的 root 相同，它们就在同一个 component 中。

当一条 edge 连接两个不同 components 时， union 会把它们合并。

4️⃣ Parent Array

Basic Idea

Union Find 通常使用 parent array。

初始状态：

parent[i] = i

表示每个 node 自己就是一个 component。

Example

parent = [0, 1, 2, 3, 4]

Union 0 和 1 后：

parent = [0, 0, 2, 3, 4]

现在：

0 and 1 are in the same set

👉 面试回答

Parent array 存储每个 node 的 parent。

初始时，每个 node 的 parent 是自己，表示每个 node 是单独 component。

随着 union 操作，不同 components 会通过改变 parent pointer 被合并。

5️⃣ Find Operation

Purpose

find(x) 返回 x 所在 set 的 root representative。

如果：

find(a) == find(b)

说明：

a and b are connected

Basic Code

def find(x):
    while parent[x] != x:
        x = parent[x]

    return x

Problem

如果不优化， tree 可能变得很高。

Worst case:

0 <- 1 <- 2 <- 3 <- 4

此时 find 可能退化为 O(n)。

👉 面试回答

Find operation 用来返回一个 node 的 root representative。

如果两个 nodes 的 root 相同，它们就是 connected。

基础 find 会沿着 parent pointer 一直走到 root。

6️⃣ Path Compression

Core Idea

Path compression 会在 find 的过程中压平 tree。

查找 x 的 root 时，把路径上的 nodes 都直接指向 root。

Code

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])

    return parent[x]

Why It Helps

压缩前：

4 → 3 → 2 → 1 → 0

执行 find(4) 后：

→ 0
→ 0
→ 0
→ 0

之后的 find 会非常快。

👉 面试回答

Path compression 通过让路径上的 nodes 直接指向 root，来优化 find。

这样 tree 会随着操作逐渐变平。

之后对同一个 component 的 find 操作会接近 O(1)。

7️⃣ Union Operation

Purpose

union(a, b) 合并 a 和 b 所在的 sets。

Steps:

root_a = find(a)
root_b = find(b)

if root_a != root_b:
    parent[root_b] = root_a

Basic Code

def union(a, b):
    root_a = find(a)
    root_b = find(b)

    if root_a == root_b:
        return False

    parent[root_b] = root_a
    return True

Return Value

返回 False 在这些题中很有用：

a and b are already connected

这通常意味着：

adding this edge creates a cycle

👉 面试回答

Union operation 用来合并两个 sets。

我会先找到两个 nodes 的 roots。

如果 roots 相同，说明它们已经 connected。

否则，把一个 root 挂到另一个 root 下，合并两个 components。

8️⃣ Union by Rank

Problem

如果随意 attach root， tree 可能变高。

Union by rank 用来保持 tree shallow。

Rank Meaning

rank[root] = tree 的 approximate height

把 rank 小的 tree 挂到 rank 大的 tree 下面。

Code

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.rank[root_a] < self.rank[root_b]:
            self.parent[root_a] = root_b
        elif self.rank[root_a] > self.rank[root_b]:
            self.parent[root_b] = root_a
        else:
            self.parent[root_b] = root_a
            self.rank[root_a] += 1

        return True

👉 面试回答

Union by rank 用来保持 Union Find 的 tree shallow。

我把 rank 更小的 root 挂到 rank 更大的 root 下。

如果两个 rank 相同，选择其中一个作为 root，并把它的 rank 增加 1。

配合 path compression， Union Find 操作接近 O(1)。

9️⃣ Union by Size

Alternative to Rank

也可以不用 rank，而是记录 component size。

size[root] = component 中 node 数量

把小 component 挂到大 component 下面。

Code

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.size[root_a] < self.size[root_b]:
            root_a, root_b = root_b, root_a

        self.parent[root_b] = root_a
        self.size[root_a] += self.size[root_b]

        return True

👉 面试回答

Union by size 会把较小 component 挂到较大 component 下面。

这样也可以保持 tree shallow。

如果题目还需要 component size， union by size 通常更方便。

🔟 Complexity Analysis

Without Optimization

Find worst case 可能是：

O(n)

With Path Compression and Union by Rank / Size

每个操作几乎是 constant time：

O(α(n))

其中：

α(n) = inverse Ackermann function

实践中：

almost O(1)

For m Operations

Time: O(m α(n))
Space: O(n)

👉 面试回答

使用 path compression 和 union by rank 或 size 后， Union Find 的操作接近 constant time。

严格来说， amortized time 是 O(α(n))， α 是 inverse Ackermann function。

实际工程和面试中，基本可以理解为接近 O(1)。

1️⃣1️⃣ Connected Components

Problem

给定 n 个 nodes 和 edges，返回 connected components 数量。

Idea

初始：

components = n

每次 successful union：

components -= 1

Code

def count_components(n, edges):
    parent = list(range(n))
    rank = [0] * n

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        if rank[root_a] < rank[root_b]:
            parent[root_a] = root_b
        elif rank[root_a] > rank[root_b]:
            parent[root_b] = root_a
        else:
            parent[root_b] = root_a
            rank[root_a] += 1

        return True

    components = n

    for a, b in edges:
        if union(a, b):
            components -= 1

    return components

👉 面试回答

统计 connected components 时，我一开始把每个 node 当成一个 component，所以 count = n。

每处理一条 edge，如果 union 成功合并两个不同 components， count 就减一。

最终 count 就是 connected components 数量。

1️⃣2️⃣ Number of Provinces

Problem

给定 adjacency matrix，返回 connected groups 数量。

Code

def find_circle_num(is_connected):
    n = len(is_connected)
    parent = list(range(n))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a != root_b:
            parent[root_b] = root_a

    for i in range(n):
        for j in range(i + 1, n):
            if is_connected[i][j] == 1:
                union(i, j)

    roots = set()

    for i in range(n):
        roots.add(find(i))

    return len(roots)

👉 面试回答

Number of Provinces 是 connected components 问题。

每个 city 是一个 node。

如果两个 city connected，就 union 它们。

最后 unique roots 的数量就是 provinces 数量。

1️⃣3️⃣ Undirected Graph Cycle Detection

Problem

给定 undirected edges，判断是否出现 cycle。

Key Insight

如果 edge (a, b) 连接的两个 nodes 已经在同一个 component 中：

cycle exists

Code

def has_cycle(n, edges):
    parent = list(range(n))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        parent[root_b] = root_a
        return True

    for a, b in edges:
        if not union(a, b):
            return True

    return False

👉 面试回答

在 undirected graph 中， Union Find 可以用来检测 cycle。

对每条 edge，如果两个 endpoints 已经有相同 root，说明它们已经 connected。

此时再加入这条 edge 就会形成 cycle。

否则就 union 两个 components。

1️⃣4️⃣ Redundant Connection

Problem

一个 graph 原本是 tree，后来多加了一条 edge。

返回造成 cycle 的 edge。

Code

def find_redundant_connection(edges):
    n = len(edges)
    parent = list(range(n + 1))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        parent[root_b] = root_a
        return True

    for a, b in edges:
        if not union(a, b):
            return [a, b]

    return []

👉 面试回答

Redundant Connection 是 undirected graph 的 cycle detection。

我按顺序处理 edges。

如果某条 edge 的两个 endpoints 已经 connected，那么这条 edge 会造成 cycle，它就是 redundant edge。

1️⃣5️⃣ Accounts Merge

Problem

每个 account 有 name 和 emails。如果两个 accounts 共享 email，它们属于同一个 person。

Union Find Idea

把每个 email 当成 node。

对每个 account，把其中所有 emails union 到一起。

最后按照 root 聚合 emails。

Code

from collections import defaultdict

def accounts_merge(accounts):
    parent = {}
    email_to_name = {}

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a != root_b:
            parent[root_b] = root_a

    for account in accounts:
        name = account[0]

        for email in account[1:]:
            if email not in parent:
                parent[email] = email

            email_to_name[email] = name

        first_email = account[1]

        for email in account[2:]:
            union(first_email, email)

    groups = defaultdict(list)

    for email in parent:
        root = find(email)
        groups[root].append(email)

    result = []

    for root, emails in groups.items():
        name = email_to_name[root]
        result.append([name] + sorted(emails))

    return result

👉 面试回答

对 Accounts Merge，我把每个 email 当成一个 node。

同一个 account 中的 emails 应该属于同一个 person，所以把它们 union。

处理完所有 accounts 后，拥有相同 root 的 emails 属于同一个 group。

最后按 root 分组并排序 emails。

1️⃣6️⃣ Satisfiability of Equality Equations

Problem

给定 equations：

a == b
b != c

判断所有 equations 是否可以同时满足。

Union Find Idea

先处理所有 equality：

a == b

把这些 variables union。

然后处理 inequality：

a != b

如果两个不应该相等的 variables 有相同 root，就 contradiction。

Code

def equations_possible(equations):
    parent = list(range(26))

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a != root_b:
            parent[root_b] = root_a

    for eq in equations:
        if eq[1:3] == "==":
            a = ord(eq[0]) - ord("a")
            b = ord(eq[3]) - ord("a")
            union(a, b)

    for eq in equations:
        if eq[1:3] == "!=":
            a = ord(eq[0]) - ord("a")
            b = ord(eq[3]) - ord("a")

            if find(a) == find(b):
                return False

    return True

👉 面试回答

对 equality equations，我先把必须相等的 variables union 到同一个 set。

然后检查 inequality equations。

如果两个必须不相等的 variables 有相同 root，就说明出现 contradiction。

否则所有 equations 可以满足。

1️⃣7️⃣ Number of Islands II

Problem

给定一个初始全是 water 的 grid， land cells 会一个一个被加入。

每次加入后，返回当前 islands 数量。

Why Union Find?

这是 dynamic connectivity。

如果每次都 DFS 重新统计，会很贵。

Union Find 可以在新增 land 时，只和周围 land 合并。

Code

def num_islands2(m, n, positions):
    parent = {}
    rank = {}
    count = 0
    result = []

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        nonlocal count

        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return

        if rank[root_a] < rank[root_b]:
            parent[root_a] = root_b
        elif rank[root_a] > rank[root_b]:
            parent[root_b] = root_a
        else:
            parent[root_b] = root_a
            rank[root_a] += 1

        count -= 1

    directions = [(1, 0), (-1, 0), (0, 1), (0, -1)]

    for r, c in positions:
        cell = (r, c)

        if cell in parent:
            result.append(count)
            continue

        parent[cell] = cell
        rank[cell] = 0
        count += 1

        for dr, dc in directions:
            nr = r + dr
            nc = c + dc
            neighbor = (nr, nc)

            if 0 <= nr < m and 0 <= nc < n and neighbor in parent:
                union(cell, neighbor)

        result.append(count)

    return result

👉 面试回答

Number of Islands II 是 dynamic connectivity 问题。

每个新 land cell 一开始是一个新的 island。

然后检查它上下左右四个 neighbor。

如果 neighbor 已经是 land，就把两个 cells union。

每次 successful union 都会让 island count 减一。

1️⃣8️⃣ Kruskal’s Algorithm

Problem

在 weighted undirected graph 中，找到 minimum spanning tree。

Why Union Find?

Kruskal 会按 weight 从小到大处理 edges。

每次加入不会形成 cycle 的最小 edge。

Union Find 用来判断两个 endpoints 是否已经 connected。

Code Skeleton

def kruskal(n, edges):
    parent = list(range(n))
    rank = [0] * n

    def find(x):
        if parent[x] != x:
            parent[x] = find(parent[x])

        return parent[x]

    def union(a, b):
        root_a = find(a)
        root_b = find(b)

        if root_a == root_b:
            return False

        if rank[root_a] < rank[root_b]:
            parent[root_a] = root_b
        elif rank[root_a] > rank[root_b]:
            parent[root_b] = root_a
        else:
            parent[root_b] = root_a
            rank[root_a] += 1

        return True

    edges.sort(key=lambda x: x[2])

    total_weight = 0
    edge_count = 0

    for a, b, weight in edges:
        if union(a, b):
            total_weight += weight
            edge_count += 1

            if edge_count == n - 1:
                break

    return total_weight

👉 面试回答

Kruskal’s algorithm 使用 Union Find 构建 minimum spanning tree。

我们先按 edge weight 排序。

然后依次选择最小 edge。

如果这条 edge 连接两个不同 components，就加入 MST。

如果 endpoints 已经 connected，加入这条 edge 会形成 cycle，所以跳过。

1️⃣9️⃣ Union Find vs DFS / BFS

DFS / BFS

适合：

static graph traversal
finding components once
exploring all nodes in a component
grid connected regions

Union Find

适合：

many connectivity queries
dynamic edge additions
merging components
cycle detection while processing edges

Comparison

Pattern	Best For
DFS / BFS	Traverse graph or grid once
Union Find	Dynamic connectivity and repeated connectivity checks

👉 面试回答

DFS 和 BFS 适合一次性 traverse graph，或一次性找 connected components。

Union Find 更适合反复 merge components，或者反复查询两个 nodes 是否 connected。

特别是 edges incrementally added 的场景， Union Find 通常更自然。

2️⃣0️⃣ Union Find Limitations

Good At

Union Find 擅长：

adding connections
checking whether two nodes are connected
counting components
detecting cycles in undirected graphs

Not Good At

Union Find 不适合：

removing edges
finding actual paths
handling directed graph reachability
shortest path
traversing all nodes in a component

👉 面试回答

Union Find 非常适合 edge additions 下的 connectivity。

但它不擅长 edge deletion、shortest path、directed reachability，或者返回两个 nodes 之间的 actual path。

这些问题通常要用 DFS、BFS 或 shortest-path algorithms。

2️⃣1️⃣ 常见 Edge Cases

Empty Edges

edges = []

每个 node 自己是一个 component。

Duplicate Edges

[0, 1]
[0, 1]

第二次 union 可能返回 false。

Self Loop

[0, 0]

通常表示 already connected。

是否算 cycle 取决于题目。

1-Indexed Nodes

很多 graph problems 的 nodes 是：

1 to n

需要 parent size：

parent = list(range(n + 1))

Dynamic Nodes

对于 emails 或 grid cells， nodes 可能不是整数。

使用 dictionary parent map。

👉 面试回答

Union Find 常见 edge cases 包括 empty edges、 duplicate edges、 self-loop、 1-indexed nodes，以及动态出现的 nodes，比如 emails 或 grid cells。

如果 nodes 是整数且范围固定，用 parent array。

如果 nodes 动态出现，用 dictionary parent map。

2️⃣2️⃣ 常见错误

Mistake 1: Union 时没有先 Find Root

错误：

parent[b] = a

正确：

parent[find(b)] = find(a)

Mistake 2: 忘记 Path Compression

会导致 find 变慢。

Mistake 3: Component Count 更新错误

只有 union 真正合并两个不同 roots 时， component count 才应该减少。

Mistake 4: 1-Indexed Node Bug

如果 nodes 是 1 到 n， parent array 要开 n + 1。

Mistake 5: 用 Union Find 处理 Directed Reachability

Union Find 不保留方向。

Mistake 6: 期待 Union Find 返回 Actual Path

Union Find 只告诉你 connected 与否，不告诉你怎么 connected。

👉 面试回答

Union Find 常见错误包括：没有 union roots 而是 union raw nodes、忘记 path compression、 union 失败时仍然减少 component count、 1-indexed nodes 数组大小错误、用 Union Find 处理 directed reachability，以及期待它返回 actual path。

Union Find 回答的是 connectivity，不是 path reconstruction。

2️⃣3️⃣ 什么时候 Union Find 适用？

Good Fit

当题目问：

connected components
are two nodes connected
merge groups
detect cycle in undirected graph
dynamic connectivity with added edges
number of groups after unions
minimum spanning tree with Kruskal

Problem Signals

看到这些关键词想到 Union Find：

connected
component
group
merge
same set
friend circle
province
redundant connection
island added
union
equivalent

👉 面试回答

当问题涉及 merge groups，或判断两个 items 是否属于同一个 group 时，我会考虑 Union Find。

常见信号包括 connected components、provinces、 redundant connection、accounts merge、equivalence relations，以及动态 island counting。

2️⃣4️⃣ 什么时候 Union Find 不适用？

Not Good Fit

Union Find 不适合：

need shortest path
need actual path
need directed reachability
need remove edges dynamically
need traverse all nodes in a component
need topological ordering

Better Alternatives

Problem Type	Better Pattern
Shortest path unweighted	BFS
Weighted shortest path	Dijkstra
Directed dependency order	Topological Sort
Find actual path	DFS / BFS with parent
Connected regions in static grid	DFS / BFS
Edge deletion dynamic graph	Advanced dynamic connectivity

👉 面试回答

Union Find 不是通用 graph traversal 工具。

它不适合 shortest path、directed reachability、 topological ordering，或者寻找 actual path。

它最适合 undirected connectivity 和 component merging。

2️⃣5️⃣ 标准模板

Basic Union Find Template

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        self.parent[root_b] = root_a
        return True

Union Find With Rank Template

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.rank[root_a] < self.rank[root_b]:
            self.parent[root_a] = root_b
        elif self.rank[root_a] > self.rank[root_b]:
            self.parent[root_b] = root_a
        else:
            self.parent[root_b] = root_a
            self.rank[root_a] += 1

        self.count -= 1
        return True

Union Find With Size Template

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n
        self.count = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.size[root_a] < self.size[root_b]:
            root_a, root_b = root_b, root_a

        self.parent[root_b] = root_a
        self.size[root_a] += self.size[root_b]
        self.count -= 1

        return True

Dictionary-Based Union Find Template

class UnionFind:
    def __init__(self):
        self.parent = {}
        self.size = {}

    def add(self, x):
        if x not in self.parent:
            self.parent[x] = x
            self.size[x] = 1

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])

        return self.parent[x]

    def union(self, a, b):
        self.add(a)
        self.add(b)

        root_a = self.find(a)
        root_b = self.find(b)

        if root_a == root_b:
            return False

        if self.size[root_a] < self.size[root_b]:
            root_a, root_b = root_b, root_a

        self.parent[root_b] = root_a
        self.size[root_a] += self.size[root_b]

        return True

🧠 Staff-Level Answer Final

👉 面试回答完整版本

Union Find，也叫 Disjoint Set Union，是一种用于管理 disjoint groups 和回答 connectivity questions 的数据结构。

它支持两个核心操作。 Find 返回一个元素所在 set 的 representative root。 Union 合并两个 sets。

基本思想是：每个 connected component 都有一个代表 root。如果两个 nodes 的 root 相同，它们就在同一个 connected component 中。

Parent array 存储 forest structure。初始时 parent[i] = i，表示每个 node 自己是一个 component。

Find 操作沿着 parent pointers 找到 root。加上 path compression 后， find 会把路径上的所有 nodes 直接指向 root。这样 tree 会变平，之后的 find 会更快。

Union 操作会先找到两个 roots。如果 roots 已经相同，说明两个 nodes 已经 connected， union 返回 false。否则，把一个 root 挂到另一个 root 下，合并两个 components。

为了保持 tree shallow，通常使用 union by rank 或 union by size。 Union by rank 把较矮的 tree 挂到较高的 tree 下。 Union by size 把较小的 component 挂到较大的 component 下。

使用 path compression 加 union by rank 或 size 后， Union Find 操作几乎是 constant time，更严格地说是 amortized O(α(n))。

Union Find 常用于 connected components、undirected graph cycle detection、 redundant connection、number of provinces、accounts merge、 equality equations、dynamic island counting，以及 Kruskal minimum spanning tree。

对 connected components counting，我从 n 个 components 开始。每次 successful union， component count 减一。

对 undirected graph cycle detection，如果一条 edge 的两个 endpoints 已经有相同 root，加入这条 edge 就会形成 cycle。

对 Number of Islands II 这种 dynamic problem，每个新增 land cell 一开始是新的 component。然后和 neighboring lands union。每次 successful merge 都减少 island count。

Union Find 不是通用 graph traversal algorithm。它不能提供 shortest path、actual path、directed reachability 或 topological order。它最适合 undirected connectivity 和 group merging。

高级工程师解释 Union Find 时，我会讲清楚：每个 node 代表什么，两个 nodes 在 same set 中意味着什么， find 和 union 如何工作， component count 如何变化， path compression 和 union by rank 为什么高效，以及为什么这里 Union Find 比 DFS 或 BFS 更合适。

⭐ Final Insight

Union Find 的核心不是“parent array 模板”。

它的核心是动态维护 connected components。

Senior-level 的重点是：

node 代表什么， root 代表什么， find 如何找到 component representative， union 什么时候真正 merge， component count 什么时候减少， path compression 为什么有效，以及 Union Find 适合 connectivity，但不适合 shortest path 或 directed reachability。

能讲清楚这些， Union Find 就是一种处理 connected components、cycle detection、dynamic connectivity 和 group merging 的核心数据结构。