LeetCode Patterns - 17 Heap - Priority Queue

Post by ailswan June. 02, 2026

中文 ↓

🎯 Heap / Priority Queue

1️⃣ Core Framework

When discussing Heap / Priority Queue, I frame it as:

  1. What problem pattern heap solves
  2. Min-heap and max-heap
  3. Top K problems
  4. K-way merge
  5. Streaming median
  6. Scheduling problems
  7. Heap with lazy deletion
  8. Heap vs sorting
  9. Common edge cases
  10. Senior-level explanation

2️⃣ What Problem Are We Solving?

A heap is used when we repeatedly need the smallest or largest item efficiently.

It is commonly used for:

The core idea is:

Maintain a dynamic set of candidates,
and repeatedly extract the best candidate.

👉 Interview Answer

A heap, or priority queue, is used when we need repeated access to the minimum or maximum element.

It supports inserting elements and removing the highest-priority element efficiently.

In Python, heapq implements a min-heap.

For max-heap behavior, we usually store negative values.


3️⃣ Heap Basics


Min-Heap

In a min-heap:

smallest element is always at the top

In Python:

import heapq

Operations:

heapq.heappush(heap, value)
heapq.heappop(heap)
heap[0]

Time Complexity

Operation Complexity
Push O(log n)
Pop O(log n)
Peek min O(1)
Heapify O(n)

👉 Interview Answer

A heap gives efficient access to the smallest element.

Push and pop are O(log n), while peeking at the top is O(1).

Building a heap from an existing list using heapify takes O(n).


4️⃣ Max-Heap in Python


Python Has Min-Heap

Python heapq is a min-heap by default.

To simulate a max-heap:

heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)

Example

import heapq

nums = [3, 1, 5, 2]
heap = []

for num in nums:
    heapq.heappush(heap, -num)

print(-heapq.heappop(heap))  # 5

👉 Interview Answer

Python’s heapq is a min-heap.

If I need max-heap behavior, I store negative values.

Then the smallest negative number represents the largest original value.


5️⃣ Heap vs Sorting


Sorting

If we sort everything:

O(n log n)

Good when:

we need full ordering

Heap

If we only need top k:

O(n log k)

Good when:

k is much smaller than n
streaming input
dynamic candidates
repeated min/max extraction

Key Difference

Need Better Choice
Full sorted order Sorting
Top K only Heap
Dynamic candidates Heap
Repeated min/max Heap

👉 Interview Answer

Sorting is good when we need the full order.

Heap is better when we only need the top k elements, or when candidates change dynamically.

For top k problems, maintaining a heap of size k gives O(n log k) time.


6️⃣ Top K Pattern


Problem Pattern

Find:

k largest
k smallest
k most frequent
k closest

Key Idea

Maintain a heap of size k.

For k largest:

use min-heap of size k

The heap stores the current k largest elements.

If a new element is larger than the smallest in heap, replace it.


Template

import heapq

def top_k_largest(nums, k):
    heap = []

    for num in nums:
        heapq.heappush(heap, num)

        if len(heap) > k:
            heapq.heappop(heap)

    return heap

👉 Interview Answer

For top k largest elements, I maintain a min-heap of size k.

The heap contains the best k candidates seen so far.

If the heap grows larger than k, I remove the smallest element.

At the end, the heap contains the k largest elements.


7️⃣ Kth Largest Element


Problem

Find the kth largest element in an array.


Heap Idea

Use a min-heap of size k.

After processing all numbers:

heap[0] is the kth largest

Code

import heapq

def find_kth_largest(nums, k):
    heap = []

    for num in nums:
        heapq.heappush(heap, num)

        if len(heap) > k:
            heapq.heappop(heap)

    return heap[0]

Complexity

Time: O(n log k)
Space: O(k)

👉 Interview Answer

For kth largest element, I keep a min-heap of size k.

The heap stores the k largest values seen so far.

The smallest value in that heap is the kth largest overall.

This gives O(n log k) time and O(k) space.


8️⃣ Top K Frequent Elements


Problem

Given nums, return the k most frequent elements.


Steps

1. Count frequency.
2. Push (frequency, number) into min-heap.
3. Keep heap size k.
4. Return elements in heap.

Code

from collections import Counter
import heapq

def top_k_frequent(nums, k):
    freq = Counter(nums)
    heap = []

    for num, count in freq.items():
        heapq.heappush(heap, (count, num))

        if len(heap) > k:
            heapq.heappop(heap)

    return [num for count, num in heap]

Complexity

Time: O(n log k)
Space: O(n)

👉 Interview Answer

For Top K Frequent Elements, I first count frequencies.

Then I maintain a min-heap of size k ordered by frequency.

If the heap exceeds size k, I remove the element with the lowest frequency.

At the end, the heap contains the k most frequent elements.


9️⃣ K Closest Points to Origin


Problem

Given points on a 2D plane, return k points closest to the origin.

Distance:

x² + y²

No need to use square root.


Min-Heap Approach

Push all points by distance, then pop k times.

import heapq

def k_closest(points, k):
    heap = []

    for x, y in points:
        dist = x * x + y * y
        heapq.heappush(heap, (dist, x, y))

    result = []

    for _ in range(k):
        dist, x, y = heapq.heappop(heap)
        result.append([x, y])

    return result

Size-k Max-Heap Approach

Better if k is small.

import heapq

def k_closest(points, k):
    heap = []

    for x, y in points:
        dist = x * x + y * y
        heapq.heappush(heap, (-dist, x, y))

        if len(heap) > k:
            heapq.heappop(heap)

    return [[x, y] for dist, x, y in heap]

👉 Interview Answer

For K Closest Points, I compare squared distance to avoid square root.

If I push all points into a min-heap, I can pop k closest points.

If k is much smaller than n, I can maintain a size-k max-heap using negative distances.


🔟 Merge K Sorted Lists


Problem

Given k sorted linked lists, merge them into one sorted list.


Heap Idea

Put the head of each list into a min-heap.

Repeatedly pop the smallest node, append it to result, then push its next node.


Code

import heapq

def merge_k_lists(lists):
    heap = []

    for i, node in enumerate(lists):
        if node:
            heapq.heappush(heap, (node.val, i, node))

    dummy = ListNode(0)
    current = dummy

    while heap:
        value, i, node = heapq.heappop(heap)

        current.next = node
        current = current.next

        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))

    return dummy.next

Why Include i

If two nodes have the same value, Python may try to compare node objects directly.

Adding i prevents comparison errors.


👉 Interview Answer

For merging k sorted lists, I use a min-heap containing the current head of each list.

Each time I pop the smallest node, append it to the result, and push the next node from the same list.

The heap size is at most k, so each operation costs O(log k).


1️⃣1️⃣ K-Way Merge Pattern


Common Pattern

K-way merge appears when we have multiple sorted sources.

Examples:

merge k sorted lists
kth smallest element in sorted matrix
smallest range covering k lists
external merge sort

Template

import heapq

def k_way_merge(lists):
    heap = []

    for list_id, arr in enumerate(lists):
        if arr:
            heapq.heappush(heap, (arr[0], list_id, 0))

    result = []

    while heap:
        value, list_id, index = heapq.heappop(heap)
        result.append(value)

        next_index = index + 1

        if next_index < len(lists[list_id]):
            next_value = lists[list_id][next_index]
            heapq.heappush(heap, (next_value, list_id, next_index))

    return result

👉 Interview Answer

K-way merge uses a heap to repeatedly choose the smallest current candidate among k sorted sources.

Each heap entry stores the value and enough information to know where the next value comes from.

This keeps the heap size at most k.


1️⃣2️⃣ Kth Smallest in Sorted Matrix


Problem

Given an n x n matrix where rows and columns are sorted, return the kth smallest element.


Heap Idea

Treat each row as a sorted list.

Use k-way merge.


Code

import heapq

def kth_smallest(matrix, k):
    n = len(matrix)
    heap = []

    for r in range(min(n, k)):
        heapq.heappush(heap, (matrix[r][0], r, 0))

    for _ in range(k - 1):
        value, r, c = heapq.heappop(heap)

        if c + 1 < n:
            heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))

    return heapq.heappop(heap)[0]

👉 Interview Answer

For kth smallest in a sorted matrix, I can treat each row as a sorted list.

I initialize the heap with the first element of each row.

Then I pop the smallest element k times, pushing the next element from the same row whenever possible.


1️⃣3️⃣ Find Median From Data Stream


Problem

Numbers arrive one by one. Return the median at any time.


Heap Idea

Use two heaps:

max-heap for smaller half
min-heap for larger half

In Python:

small = max-heap using negative values
large = min-heap

Rules

Maintain:

len(small) == len(large)
or
len(small) == len(large) + 1

All values in small should be <= all values in large.


Code

import heapq

class MedianFinder:

    def __init__(self):
        self.small = []
        self.large = []

    def addNum(self, num):
        heapq.heappush(self.small, -num)

        if self.large and -self.small[0] > self.large[0]:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.small) > len(self.large) + 1:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.large) > len(self.small):
            value = heapq.heappop(self.large)
            heapq.heappush(self.small, -value)

    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]

        return (-self.small[0] + self.large[0]) / 2

👉 Interview Answer

For streaming median, I use two heaps.

The max-heap stores the smaller half, and the min-heap stores the larger half.

I keep the heaps balanced so their sizes differ by at most one.

If one heap has more elements, its top is the median.

If both heaps have equal size, the median is the average of the two tops.


1️⃣4️⃣ Meeting Rooms II


Problem

Given meeting intervals, return the minimum number of meeting rooms required.


Heap Idea

Sort meetings by start time.

Use a min-heap of end times.

For each meeting:

if earliest ending room is free,
reuse it
otherwise allocate new room

Code

import heapq

def min_meeting_rooms(intervals):
    if not intervals:
        return 0

    intervals.sort(key=lambda x: x[0])
    heap = []

    for start, end in intervals:
        if heap and heap[0] <= start:
            heapq.heappop(heap)

        heapq.heappush(heap, end)

    return len(heap)

👉 Interview Answer

For Meeting Rooms II, I sort meetings by start time.

I use a min-heap to track room end times.

If the earliest ending meeting has ended before the current meeting starts, I reuse that room.

Otherwise, I allocate a new room.

The maximum heap size represents the required room count.


1️⃣5️⃣ Task Scheduling With Heap


Problem Pattern

At each step, choose the highest-priority available task.

Examples:

most frequent task
earliest deadline
shortest processing time
highest profit

Heap Use

Use heap when:

available candidates change over time
priority changes which item should be processed next

Example: CPU Scheduling

sort tasks by enqueue time
push available tasks into heap by processing time
pop shortest task

👉 Interview Answer

Heap is useful in scheduling when the set of available tasks changes over time.

I usually sort events by time, push available tasks into a heap, and repeatedly process the best available task.

The heap represents the current candidate set.


1️⃣6️⃣ Reorganize String


Problem

Rearrange string so no two adjacent characters are the same.


Heap Idea

Always choose the most frequent available character, but hold the previously used character out for one round.


Code

from collections import Counter
import heapq

def reorganize_string(s):
    freq = Counter(s)
    heap = [(-count, ch) for ch, count in freq.items()]
    heapq.heapify(heap)

    result = []
    prev_count = 0
    prev_ch = ""

    while heap:
        count, ch = heapq.heappop(heap)
        result.append(ch)

        if prev_count < 0:
            heapq.heappush(heap, (prev_count, prev_ch))

        count += 1
        prev_count = count
        prev_ch = ch

    answer = "".join(result)

    if len(answer) != len(s):
        return ""

    return answer

👉 Interview Answer

Reorganize String uses a max-heap by frequency.

I always choose the character with the highest remaining count.

After using a character, I temporarily hold it back for one step, so it cannot be used twice in a row.


1️⃣7️⃣ Dijkstra’s Algorithm and Heap


Why Heap Is Used

Dijkstra repeatedly needs the unvisited node with the smallest known distance.

A min-heap provides this efficiently.


Pattern

heap entry = (distance, node)

Each time:

pop smallest distance
relax neighbors
push updated distances

Code Skeleton

import heapq

def dijkstra(graph, start):
    dist = {node: float("inf") for node in graph}
    dist[start] = 0

    heap = [(0, start)]

    while heap:
        current_dist, node = heapq.heappop(heap)

        if current_dist > dist[node]:
            continue

        for neighbor, weight in graph[node]:
            new_dist = current_dist + weight

            if new_dist < dist[neighbor]:
                dist[neighbor] = new_dist
                heapq.heappush(heap, (new_dist, neighbor))

    return dist

👉 Interview Answer

Dijkstra uses a min-heap to always process the node with the smallest known distance.

When a shorter path to a neighbor is found, we update its distance and push it into the heap.

If we later pop an outdated distance, we skip it.


1️⃣8️⃣ Lazy Deletion


Problem

Python heap does not support efficient removal of arbitrary elements.

This matters for:

sliding window median
frequency updates
removing expired tasks

Lazy Deletion Idea

Instead of removing immediately:

mark item as deleted
remove it later when it reaches heap top

Pattern

while heap and should_delete(heap[0]):
    heapq.heappop(heap)

👉 Interview Answer

Heap does not support efficient arbitrary deletion.

For problems that need deletion, I use lazy deletion.

I mark elements as invalid in a counter or set, and only remove them when they appear at the top of the heap.

This keeps heap operations efficient.


1️⃣9️⃣ Heap With Tuples


Python Tuple Ordering

Python heap compares tuples lexicographically.

Example:

(priority, value)

It first compares priority.

If priorities tie, it compares value.


Problem

If value is a custom object, Python may not know how to compare it.

Example:

(node.val, node)

If two nodes have same value, this can fail.


Fix

Add a unique tie-breaker:

(node.val, index, node)

👉 Interview Answer

Python heap compares tuple elements in order.

If the first values tie, it compares the next values.

When storing custom objects, I add a unique tie-breaker such as an index or counter, so Python never needs to compare the objects directly.


2️⃣0️⃣ Heap Complexity Analysis


Basic Heap

For n elements:

push: O(log n)
pop: O(log n)
peek: O(1)
heapify: O(n)

Top K

Time: O(n log k)
Space: O(k)

K-Way Merge

If total elements are N and k lists:

Time: O(N log k)
Space: O(k)

Streaming Median

Each insertion:

O(log n)

Find median:

O(1)

👉 Interview Answer

Heap operations are usually O(log n) for push and pop.

For top k problems, if we keep heap size k, the time is O(n log k).

For k-way merge, heap size is k, so total time is O(N log k).

For streaming median, insertion is O(log n), and median query is O(1).


2️⃣1️⃣ Common Edge Cases


Empty Input

nums = []
lists = []
intervals = []

k Is Zero

k = 0

Usually return empty result.


k Greater Than Input Size

Need decide whether:

return all elements
or
input is invalid

Duplicate Values

Heap handles duplicates, but tuple tie-breaking may matter.


Custom Objects

Need tie-breaker.


Negative Values

For max-heap simulation, be careful with negation.


👉 Interview Answer

Common heap edge cases include empty input, k equals zero, k larger than input size, duplicate priorities, custom objects requiring tie-breakers, and max-heap simulation with negative values.

I usually clarify what should happen when k is larger than the number of items.


2️⃣2️⃣ Common Mistakes


Mistake 1: Using Full Sorting for Dynamic Problems

Sorting once does not help if candidates change over time.


Mistake 2: Wrong Heap Size

For top k problems, heap should usually be capped at k.


Mistake 3: Wrong Sign for Max-Heap

For max-heap:

heappush(heap, -value)

and remember to negate when popping.


Mistake 4: No Tie-Breaker for Objects

Wrong:

(node.val, node)

Better:

(node.val, index, node)

Mistake 5: Forgetting Lazy Deletion

Trying to remove arbitrary elements directly from heap is inefficient.


Mistake 6: Returning Heap Without Ordering

A heap is not fully sorted.

If sorted output is required, sort the heap before returning.


👉 Interview Answer

Common heap mistakes include using sorting when candidates are dynamic, not maintaining the correct heap size, forgetting negative values for max-heap behavior, missing tuple tie-breakers, not using lazy deletion when deletion is needed, and assuming a heap is fully sorted.


2️⃣3️⃣ When Heap Applies


Good Fit

Use heap when:

need repeated min or max
need top k
need dynamic best candidate
need merge from multiple sorted sources
need streaming median
need scheduling by priority
need shortest path with changing distances

Problem Signals

Look for words like:

kth largest
top k
closest
smallest
largest
median stream
merge k sorted
priority
schedule
available task
minimum cost repeatedly

👉 Interview Answer

I consider heap when I need repeated access to the smallest or largest candidate, especially when the candidate set changes over time.

Common signals include top k, kth largest, merge k sorted lists, streaming median, scheduling by priority, and shortest-path problems like Dijkstra.


2️⃣4️⃣ When Heap Does Not Apply


Not Good Fit

Heap may not be ideal when:

need full sorted order once
need fast lookup by key
need arbitrary deletion without lazy deletion
need range query
need exact order among all elements

Better Alternatives

Problem Type Better Pattern
Full sorting Sort
Fast membership lookup Hash Set
Range query Segment Tree / BIT
Ordered search Balanced BST
Simple max/min once Single pass
Connectivity Union Find

👉 Interview Answer

Heap is not always the right tool.

If I only need to sort all elements once, sorting may be simpler.

If I need fast lookup by key, a hash map is better.

If I need range queries, a segment tree or BIT is more appropriate.

If I need arbitrary deletion, I may need lazy deletion or a different data structure.


2️⃣5️⃣ Standard Templates


Min-Heap Template

import heapq

heap = []

heapq.heappush(heap, value)
smallest = heapq.heappop(heap)
top = heap[0]

Max-Heap Template

import heapq

heap = []

heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)
top = -heap[0]

Top K Template

import heapq

def top_k(nums, k):
    heap = []

    for num in nums:
        heapq.heappush(heap, num)

        if len(heap) > k:
            heapq.heappop(heap)

    return heap

K-Way Merge Template

import heapq

def k_way_merge(lists):
    heap = []

    for i, arr in enumerate(lists):
        if arr:
            heapq.heappush(heap, (arr[0], i, 0))

    result = []

    while heap:
        value, list_id, index = heapq.heappop(heap)
        result.append(value)

        next_index = index + 1

        if next_index < len(lists[list_id]):
            heapq.heappush(
                heap,
                (lists[list_id][next_index], list_id, next_index)
            )

    return result

Two-Heaps Median Template

import heapq

class MedianFinder:

    def __init__(self):
        self.small = []
        self.large = []

    def add_num(self, num):
        heapq.heappush(self.small, -num)

        if self.large and -self.small[0] > self.large[0]:
            heapq.heappush(self.large, -heapq.heappop(self.small))

        if len(self.small) > len(self.large) + 1:
            heapq.heappush(self.large, -heapq.heappop(self.small))

        if len(self.large) > len(self.small):
            heapq.heappush(self.small, -heapq.heappop(self.large))

    def find_median(self):
        if len(self.small) > len(self.large):
            return -self.small[0]

        return (-self.small[0] + self.large[0]) / 2

Lazy Deletion Template

import heapq
from collections import Counter

heap = []
deleted = Counter()

def prune():
    while heap and deleted[heap[0]] > 0:
        deleted[heap[0]] -= 1
        heapq.heappop(heap)

🧠 Staff-Level Answer Final


👉 Interview Answer Full Version

A heap, or priority queue, is used when we need repeated access to the smallest or largest element.

It supports push and pop in O(log n), peek in O(1), and heapify in O(n).

In Python, heapq provides a min-heap. For max-heap behavior, I usually store negative values.

Heap is especially useful when the candidate set changes dynamically. Sorting gives a full order once, but heap allows us to repeatedly extract the current best candidate as new candidates are added.

For top k problems, I usually maintain a heap of size k. For kth largest, a min-heap of size k stores the k largest elements seen so far, and the heap top is the kth largest.

For k-way merge, I put the current smallest candidate from each sorted source into the heap. Each heap entry must also store enough information to find the next element from the same source. The heap size is usually k, so the total complexity is O(N log k).

For streaming median, I use two heaps: a max-heap for the smaller half, and a min-heap for the larger half. I maintain balance between the two heaps, so the median can be returned in O(1).

For scheduling problems, a heap is useful when available candidates change over time. We sort events by time, push available candidates into a heap, and repeatedly process the best available candidate.

In Python, tuple ordering is important. If heap entries contain custom objects, I add a unique tie-breaker like an index or counter to avoid comparison errors.

Heap does not support efficient arbitrary deletion. For problems that require deletion, such as sliding window median, I use lazy deletion: mark elements as removed, and physically pop them only when they reach the heap top.

At a senior level, I would explain: what priority means, whether we need min-heap or max-heap, why heap is better than sorting, what the heap size is, what each heap entry stores, how ties are handled, and whether lazy deletion is needed.


⭐ Final Insight

Heap 的核心不是“会用 heapq”。

它的核心是维护一个 dynamic candidate set, 并高效取出当前 priority 最高的元素。

Senior-level 的重点是:

priority 是什么, min-heap 还是 max-heap, heap size 是否固定为 k, heap entry 需要存哪些信息, 为什么不用 full sorting, tie-breaker 怎么处理, 以及是否需要 lazy deletion。

能讲清楚这些, Heap 就是一种处理 Top K、K-way merge、streaming median、scheduling 和 shortest path 的核心数据结构。



中文部分


🎯 Heap / Priority Queue


1️⃣ 核心框架

讨论 Heap / Priority Queue 时,我通常从:

  1. Heap 解决什么问题
  2. Min-heap 和 max-heap
  3. Top K problems
  4. K-way merge
  5. Streaming median
  6. Scheduling problems
  7. Heap with lazy deletion
  8. Heap vs sorting
  9. 常见 edge cases
  10. 高级工程师解释方式

2️⃣ 要解决什么问题?

Heap 用于需要反复获取最小或最大元素的场景。

它常用于:

核心思想是:

维护一个动态 candidate set,
每次高效取出 priority 最高的 candidate。

👉 面试回答

Heap,也叫 priority queue, 用于需要反复访问 minimum 或 maximum element 的场景。

它支持高效插入元素, 也支持高效弹出当前最高 priority 的元素。

Python 的 heapq 默认是 min-heap。

如果需要 max-heap, 通常存 negative values。


3️⃣ Heap Basics


Min-Heap

Min-heap 中:

smallest element is always at the top

Python 中使用:

import heapq

Operations:

heapq.heappush(heap, value)
heapq.heappop(heap)
heap[0]

Time Complexity

Operation Complexity
Push O(log n)
Pop O(log n)
Peek min O(1)
Heapify O(n)

👉 面试回答

Heap 可以高效访问最小元素。

Push 和 pop 是 O(log n), peek top 是 O(1)。

对已有数组调用 heapify 建堆是 O(n)。


4️⃣ Python 中的 Max-Heap


Python 默认是 Min-Heap

Python heapq 默认是 min-heap。

如果要模拟 max-heap:

heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)

Example

import heapq

nums = [3, 1, 5, 2]
heap = []

for num in nums:
    heapq.heappush(heap, -num)

print(-heapq.heappop(heap))  # 5

👉 面试回答

Python 的 heapq 是 min-heap。

如果需要 max-heap behavior, 我会存负数。

最小的负数对应原始值中最大的数字。


5️⃣ Heap vs Sorting


Sorting

如果把所有数据排序:

O(n log n)

适合:

需要完整排序结果

Heap

如果只需要 top k:

O(n log k)

适合:

k much smaller than n
streaming input
dynamic candidates
repeated min/max extraction

Key Difference

Need Better Choice
Full sorted order Sorting
Top K only Heap
Dynamic candidates Heap
Repeated min/max Heap

👉 面试回答

Sorting 适合需要完整顺序的场景。

Heap 更适合只需要 top k, 或 candidate set 会动态变化的场景。

对 top k 问题, 维护 size k 的 heap 可以做到 O(n log k)。


6️⃣ Top K Pattern


Problem Pattern

找:

k largest
k smallest
k most frequent
k closest

Key Idea

维护一个 size k 的 heap。

对于 k largest:

use min-heap of size k

Heap 中保存当前见过的 k 个最大元素。

如果新元素大于 heap 中最小值, 就替换掉它。


Template

import heapq

def top_k_largest(nums, k):
    heap = []

    for num in nums:
        heapq.heappush(heap, num)

        if len(heap) > k:
            heapq.heappop(heap)

    return heap

👉 面试回答

对 top k largest elements, 我维护一个 size k 的 min-heap。

Heap 中始终保存当前最好的 k 个 candidates。

如果 heap size 超过 k, 就 pop 掉最小的那个。

最后 heap 中剩下的就是 k largest elements。


7️⃣ Kth Largest Element


Problem

找数组中的 kth largest element。


Heap Idea

使用 size k 的 min-heap。

处理完所有 numbers 后:

heap[0] is the kth largest

Code

import heapq

def find_kth_largest(nums, k):
    heap = []

    for num in nums:
        heapq.heappush(heap, num)

        if len(heap) > k:
            heapq.heappop(heap)

    return heap[0]

Complexity

Time: O(n log k)
Space: O(k)

👉 面试回答

对 kth largest element, 我维护一个 size k 的 min-heap。

Heap 中保存目前见过的 k 个最大值。

这个 heap 的最小值就是 overall kth largest。

时间复杂度是 O(n log k),space 是 O(k)。


8️⃣ Top K Frequent Elements


Problem

给定 nums, 返回出现频率最高的 k 个 elements。


Steps

1. Count frequency.
2. Push (frequency, number) into min-heap.
3. Keep heap size k.
4. Return elements in heap.

Code

from collections import Counter
import heapq

def top_k_frequent(nums, k):
    freq = Counter(nums)
    heap = []

    for num, count in freq.items():
        heapq.heappush(heap, (count, num))

        if len(heap) > k:
            heapq.heappop(heap)

    return [num for count, num in heap]

Complexity

Time: O(n log k)
Space: O(n)

👉 面试回答

对 Top K Frequent Elements, 我先统计 frequency。

然后用 frequency 作为 priority, 维护 size k 的 min-heap。

如果 heap 超过 k, 就移除 frequency 最小的元素。

最后 heap 中就是 k 个最高频元素。


9️⃣ K Closest Points to Origin


Problem

给定 2D points, 返回距离 origin 最近的 k 个 points。

Distance:

x² + y²

不需要开平方。


Min-Heap Approach

把所有 points 放入 min-heap, 然后 pop k 次。

import heapq

def k_closest(points, k):
    heap = []

    for x, y in points:
        dist = x * x + y * y
        heapq.heappush(heap, (dist, x, y))

    result = []

    for _ in range(k):
        dist, x, y = heapq.heappop(heap)
        result.append([x, y])

    return result

Size-k Max-Heap Approach

如果 k 很小,这个更好。

import heapq

def k_closest(points, k):
    heap = []

    for x, y in points:
        dist = x * x + y * y
        heapq.heappush(heap, (-dist, x, y))

        if len(heap) > k:
            heapq.heappop(heap)

    return [[x, y] for dist, x, y in heap]

👉 面试回答

对 K Closest Points, 我比较 squared distance, 避免不必要的 square root。

如果把所有 points 放入 min-heap, 可以 pop k 次得到最近的 points。

如果 k 远小于 n, 可以使用 size-k max-heap, 用 negative distance 模拟 max-heap。


🔟 Merge K Sorted Lists


Problem

给定 k 个 sorted linked lists, 合并成一个 sorted list。


Heap Idea

把每个 list 的 head 放入 min-heap。

每次 pop 最小 node, append 到 result, 再把它的 next node 放入 heap。


Code

import heapq

def merge_k_lists(lists):
    heap = []

    for i, node in enumerate(lists):
        if node:
            heapq.heappush(heap, (node.val, i, node))

    dummy = ListNode(0)
    current = dummy

    while heap:
        value, i, node = heapq.heappop(heap)

        current.next = node
        current = current.next

        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))

    return dummy.next

为什么需要 i

如果两个 node 的 value 相同, Python 可能尝试直接比较 node object。

加上 i 可以避免 comparison error。


👉 面试回答

对 merge k sorted lists, 我使用 min-heap 保存每个 list 当前的 head。

每次取出最小 node, 加入结果, 然后把同一个 list 的 next node 放回 heap。

Heap size 最多是 k, 所以每次操作是 O(log k)。


1️⃣1️⃣ K-Way Merge Pattern


Common Pattern

K-way merge 出现在多个 sorted sources 的合并问题中。

Examples:

merge k sorted lists
kth smallest element in sorted matrix
smallest range covering k lists
external merge sort

Template

import heapq

def k_way_merge(lists):
    heap = []

    for list_id, arr in enumerate(lists):
        if arr:
            heapq.heappush(heap, (arr[0], list_id, 0))

    result = []

    while heap:
        value, list_id, index = heapq.heappop(heap)
        result.append(value)

        next_index = index + 1

        if next_index < len(lists[list_id]):
            next_value = lists[list_id][next_index]
            heapq.heappush(heap, (next_value, list_id, next_index))

    return result

👉 面试回答

K-way merge 使用 heap, 在 k 个 sorted sources 中反复选择当前最小 candidate。

每个 heap entry 不只存 value, 还要存它来自哪个 source, 以及 source 中的 index。

这样 pop 后才能找到下一个 candidate。


1️⃣2️⃣ Kth Smallest in Sorted Matrix


Problem

给定 n x n matrix, 每一行和每一列都 sorted, 返回 kth smallest element。


Heap Idea

把每一行看成一个 sorted list。

使用 k-way merge。


Code

import heapq

def kth_smallest(matrix, k):
    n = len(matrix)
    heap = []

    for r in range(min(n, k)):
        heapq.heappush(heap, (matrix[r][0], r, 0))

    for _ in range(k - 1):
        value, r, c = heapq.heappop(heap)

        if c + 1 < n:
            heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))

    return heapq.heappop(heap)[0]

👉 面试回答

对 kth smallest in sorted matrix, 我可以把每一行看成一个 sorted list。

初始化 heap 时放入每一行的第一个元素。

然后 pop k 次最小元素, 每次从同一行推进到下一个元素。


1️⃣3️⃣ Find Median From Data Stream


Problem

数字一个一个到达。 需要随时返回 median。


Heap Idea

使用两个 heaps:

max-heap for smaller half
min-heap for larger half

Python 中:

small = max-heap using negative values
large = min-heap

Rules

维护:

len(small) == len(large)
or
len(small) == len(large) + 1

同时保证:

all values in small <= all values in large

Code

import heapq

class MedianFinder:

    def __init__(self):
        self.small = []
        self.large = []

    def addNum(self, num):
        heapq.heappush(self.small, -num)

        if self.large and -self.small[0] > self.large[0]:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.small) > len(self.large) + 1:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.large) > len(self.small):
            value = heapq.heappop(self.large)
            heapq.heappush(self.small, -value)

    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]

        return (-self.small[0] + self.large[0]) / 2

👉 面试回答

对 streaming median, 我使用两个 heaps。

Max-heap 保存较小的一半。

Min-heap 保存较大的一半。

我保持两个 heap 的 size 平衡, 并保证 small 中的所有元素都不大于 large 中的元素。

如果一个 heap 多一个元素, 它的 top 就是 median。

如果两个 heap 一样大, median 是两个 top 的平均值。


1️⃣4️⃣ Meeting Rooms II


Problem

给定 meeting intervals, 返回最少需要多少 meeting rooms。


Heap Idea

按 start time 排序。

用 min-heap 保存 room end times。

对于每个 meeting:

如果最早结束的 room 已经空出来,
就 reuse
否则 allocate new room

Code

import heapq

def min_meeting_rooms(intervals):
    if not intervals:
        return 0

    intervals.sort(key=lambda x: x[0])
    heap = []

    for start, end in intervals:
        if heap and heap[0] <= start:
            heapq.heappop(heap)

        heapq.heappush(heap, end)

    return len(heap)

👉 面试回答

对 Meeting Rooms II, 我先按 start time 排序 meetings。

然后用 min-heap 记录当前 rooms 的 end times。

如果最早结束的 meeting 已经在当前 start 前结束, 可以复用这个 room。

否则需要新 room。

Heap 的最大 size 就是需要的 room 数量。


1️⃣5️⃣ Task Scheduling With Heap


Problem Pattern

每一步都要选择 highest-priority available task。

Examples:

most frequent task
earliest deadline
shortest processing time
highest profit

Heap Use

当满足这些条件时用 heap:

available candidates change over time
priority decides next task

Example: CPU Scheduling

sort tasks by enqueue time
push available tasks into heap by processing time
pop shortest task

👉 面试回答

在 scheduling 问题中, 如果 available tasks 会随时间变化, heap 很有用。

常见做法是先按 time 排序 events, 然后把当前 available tasks 放入 heap。

每次从 heap 中取出当前最优 task。


1️⃣6️⃣ Reorganize String


Problem

重新排列 string, 让相邻 characters 不相同。


Heap Idea

每次选择 remaining frequency 最高的 character, 但把刚刚用过的 character 暂停一轮。


Code

from collections import Counter
import heapq

def reorganize_string(s):
    freq = Counter(s)
    heap = [(-count, ch) for ch, count in freq.items()]
    heapq.heapify(heap)

    result = []
    prev_count = 0
    prev_ch = ""

    while heap:
        count, ch = heapq.heappop(heap)
        result.append(ch)

        if prev_count < 0:
            heapq.heappush(heap, (prev_count, prev_ch))

        count += 1
        prev_count = count
        prev_ch = ch

    answer = "".join(result)

    if len(answer) != len(s):
        return ""

    return answer

👉 面试回答

Reorganize String 使用 max-heap by frequency。

我每次选择剩余次数最多的 character。

使用后,暂时把它 hold back 一轮, 防止马上再次使用导致相邻重复。


1️⃣7️⃣ Dijkstra’s Algorithm and Heap


为什么 Dijkstra 使用 Heap?

Dijkstra 需要反复取出当前 known distance 最小的 unvisited node。

Min-heap 可以高效完成这个操作。


Pattern

heap entry = (distance, node)

每次:

pop smallest distance
relax neighbors
push updated distances

Code Skeleton

import heapq

def dijkstra(graph, start):
    dist = {node: float("inf") for node in graph}
    dist[start] = 0

    heap = [(0, start)]

    while heap:
        current_dist, node = heapq.heappop(heap)

        if current_dist > dist[node]:
            continue

        for neighbor, weight in graph[node]:
            new_dist = current_dist + weight

            if new_dist < dist[neighbor]:
                dist[neighbor] = new_dist
                heapq.heappush(heap, (new_dist, neighbor))

    return dist

👉 面试回答

Dijkstra 使用 min-heap, 每次处理当前 known distance 最小的 node。

如果发现到 neighbor 的更短路径, 就更新 distance 并 push 到 heap。

如果之后 pop 出过期 distance, 直接 skip。


1️⃣8️⃣ Lazy Deletion


Problem

Python heap 不支持高效删除任意元素。

这在以下问题中很常见:

sliding window median
frequency updates
removing expired tasks

Lazy Deletion Idea

不要立刻从 heap 中删除。

而是:

mark item as deleted
等它到 heap top 时再真正 pop

Pattern

while heap and should_delete(heap[0]):
    heapq.heappop(heap)

👉 面试回答

Heap 不支持高效 arbitrary deletion。

如果题目需要删除非 top 元素, 我会使用 lazy deletion。

先用 counter 或 set 标记元素已经 invalid。

等这个元素到达 heap top 时, 再真正 pop 掉它。


1️⃣9️⃣ Heap With Tuples


Python Tuple Ordering

Python heap 会按照 tuple 的字典序比较。

Example:

(priority, value)

它先比较 priority。

如果 priority 相同, 再比较 value。


Problem

如果 value 是 custom object, Python 可能不知道如何比较它。

Example:

(node.val, node)

如果两个 node value 相同, 可能报错。


Fix

添加 unique tie-breaker:

(node.val, index, node)

👉 面试回答

Python heap 会按 tuple 元素顺序比较。

如果第一个 priority 相同, 它会比较第二个元素。

当 heap entry 中有 custom object 时, 我会加入一个 unique tie-breaker, 比如 index 或 counter, 避免 Python 直接比较 object。


2️⃣0️⃣ Heap Complexity Analysis


Basic Heap

对于 n 个元素:

push: O(log n)
pop: O(log n)
peek: O(1)
heapify: O(n)

Top K

Time: O(n log k)
Space: O(k)

K-Way Merge

如果总元素数是 N, 有 k 个 lists:

Time: O(N log k)
Space: O(k)

Streaming Median

每次 insertion:

O(log n)

查询 median:

O(1)

👉 面试回答

Heap 的 push 和 pop 通常是 O(log n)。

对 top k 问题, 如果 heap size 保持为 k, 时间复杂度是 O(n log k)。

对 k-way merge, heap size 是 k, 所以总时间是 O(N log k)。

对 streaming median, 插入是 O(log n), 查询 median 是 O(1)。


2️⃣1️⃣ 常见 Edge Cases


Empty Input

nums = []
lists = []
intervals = []

k Is Zero

k = 0

通常返回 empty result。


k Greater Than Input Size

需要判断:

return all elements
or
input is invalid

Duplicate Values

Heap 可以处理 duplicates, 但 tuple tie-breaking 可能很重要。


Custom Objects

需要 tie-breaker。


Negative Values

用 negative values 模拟 max-heap 时要小心。


👉 面试回答

Heap 常见 edge cases 包括 empty input、 k 等于 0、 k 大于 input size、 duplicate priorities、 custom objects 需要 tie-breaker、 以及用 negative values 模拟 max-heap。

我通常会确认 k 超过元素数量时应该返回什么。


2️⃣2️⃣ 常见错误


Mistake 1: Dynamic Problem 中使用一次 Full Sorting

如果 candidates 会动态变化, 一次 sorting 不够。


Mistake 2: Heap Size 错误

Top k 问题中, heap 通常要保持 size k。


Mistake 3: Max-Heap 符号错误

Max-heap:

heappush(heap, -value)

pop 时要记得取负号。


Mistake 4: Object 没有 Tie-Breaker

错误:

(node.val, node)

更安全:

(node.val, index, node)

Mistake 5: 忘记 Lazy Deletion

直接从 heap 中删除任意元素通常不高效。


Mistake 6: 以为 Heap 是 Fully Sorted

Heap 只保证 top 是最小值。

如果要求 sorted output, 需要额外 sort。


👉 面试回答

Heap 常见错误包括: 对动态 candidate 问题使用一次性 sorting、 heap size 维护错误、 max-heap 负号处理错误、 custom object 没有 tie-breaker、 忘记 lazy deletion、 以及误以为 heap 内部是完整 sorted 的。

Heap 只保证 top element 是最小或最高 priority。


2️⃣3️⃣ 什么时候 Heap 适用?


Good Fit

当需要:

repeated min or max
top k
dynamic best candidate
merge from multiple sorted sources
streaming median
scheduling by priority
shortest path with changing distances

Problem Signals

看到这些词想到 heap:

kth largest
top k
closest
smallest
largest
median stream
merge k sorted
priority
schedule
available task
minimum cost repeatedly

👉 面试回答

当我需要反复访问 smallest 或 largest candidate, 尤其是 candidate set 会动态变化时, 我会考虑 heap。

常见信号包括 top k、kth largest、merge k sorted lists、 streaming median、priority scheduling, 以及 Dijkstra 这种 shortest-path 问题。


2️⃣4️⃣ 什么时候 Heap 不适用?


Not Good Fit

Heap 不适合:

need full sorted order once
need fast lookup by key
need arbitrary deletion without lazy deletion
need range query
need exact order among all elements

Better Alternatives

Problem Type Better Pattern
Full sorting Sort
Fast membership lookup Hash Set
Range query Segment Tree / BIT
Ordered search Balanced BST
Simple max/min once Single pass
Connectivity Union Find

👉 面试回答

Heap 不是所有排序或 priority 问题的最佳工具。

如果只需要完整排序一次, sort 通常更简单。

如果需要按 key 快速查找, hash map 更合适。

如果需要 range query, segment tree 或 BIT 更合适。

如果需要 arbitrary deletion, 可能要 lazy deletion 或换数据结构。


2️⃣5️⃣ 标准模板


Min-Heap Template

import heapq

heap = []

heapq.heappush(heap, value)
smallest = heapq.heappop(heap)
top = heap[0]

Max-Heap Template

import heapq

heap = []

heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)
top = -heap[0]

Top K Template

import heapq

def top_k(nums, k):
    heap = []

    for num in nums:
        heapq.heappush(heap, num)

        if len(heap) > k:
            heapq.heappop(heap)

    return heap

K-Way Merge Template

import heapq

def k_way_merge(lists):
    heap = []

    for i, arr in enumerate(lists):
        if arr:
            heapq.heappush(heap, (arr[0], i, 0))

    result = []

    while heap:
        value, list_id, index = heapq.heappop(heap)
        result.append(value)

        next_index = index + 1

        if next_index < len(lists[list_id]):
            heapq.heappush(
                heap,
                (lists[list_id][next_index], list_id, next_index)
            )

    return result

Two-Heaps Median Template

import heapq

class MedianFinder:

    def __init__(self):
        self.small = []
        self.large = []

    def add_num(self, num):
        heapq.heappush(self.small, -num)

        if self.large and -self.small[0] > self.large[0]:
            heapq.heappush(self.large, -heapq.heappop(self.small))

        if len(self.small) > len(self.large) + 1:
            heapq.heappush(self.large, -heapq.heappop(self.small))

        if len(self.large) > len(self.small):
            heapq.heappush(self.small, -heapq.heappop(self.large))

    def find_median(self):
        if len(self.small) > len(self.large):
            return -self.small[0]

        return (-self.small[0] + self.large[0]) / 2

Lazy Deletion Template

import heapq
from collections import Counter

heap = []
deleted = Counter()

def prune():
    while heap and deleted[heap[0]] > 0:
        deleted[heap[0]] -= 1
        heapq.heappop(heap)

🧠 Staff-Level Answer Final


👉 面试回答完整版本

Heap,也叫 priority queue, 用于需要反复访问 smallest 或 largest element 的场景。

它支持 O(log n) 的 push 和 pop, O(1) 的 peek, 以及 O(n) 的 heapify。

Python 的 heapq 默认是 min-heap。 如果需要 max-heap behavior, 我通常用 negative values。

Heap 特别适合 candidate set 会动态变化的问题。 Sorting 可以一次性得到完整顺序, 但 heap 可以在不断加入新 candidates 的同时, 反复取出当前 best candidate。

对 top k 问题, 我通常维护一个 size k 的 heap。 对 kth largest, size k 的 min-heap 保存当前见过的 k 个最大元素, heap top 就是 kth largest。

对 k-way merge, 我会把每个 sorted source 当前最小的 candidate 放入 heap。 每个 heap entry 必须保存足够信息, 让我们知道 pop 之后应该从哪个 source 推进。 Heap size 通常是 k, 所以总复杂度是 O(N log k)。

对 streaming median, 我使用两个 heaps: 一个 max-heap 保存较小的一半, 一个 min-heap 保存较大的一半。 通过维持两个 heaps 的 size balance, median 可以 O(1) 查询。

对 scheduling problems, 如果 available candidates 会随时间变化, heap 很有用。 通常先按 time 排序 events, 再把 currently available candidates 放入 heap, 每次处理当前 priority 最好的 candidate。

在 Python 中, tuple ordering 很重要。 如果 heap entry 包含 custom object, 我会加入 unique tie-breaker, 比如 index 或 counter, 避免 Python 直接比较 object。

Heap 不支持高效 arbitrary deletion。 如果题目需要删除元素, 比如 sliding window median, 我会使用 lazy deletion: 先标记元素 invalid, 等它到 heap top 时再真正 pop。

高级工程师解释 heap 时, 我会讲清楚: priority 是什么, 需要 min-heap 还是 max-heap, 为什么 heap 比 sorting 更合适, heap size 是多少, 每个 heap entry 存什么, ties 如何处理, 以及是否需要 lazy deletion。


⭐ Final Insight

Heap 的核心不是“会用 heapq”。

它的核心是维护 dynamic candidate set, 并高效取出当前 priority 最高的元素。

Senior-level 的重点是:

priority 是什么, min-heap 还是 max-heap, heap size 是否固定为 k, heap entry 需要存哪些信息, 为什么不用 full sorting, tie-breaker 怎么处理, 以及是否需要 lazy deletion。

能讲清楚这些, Heap 就是一种处理 Top K、K-way merge、streaming median、scheduling 和 shortest path 的核心数据结构。

Implement