🎯 Heap / Priority Queue
1️⃣ Core Framework
When discussing Heap / Priority Queue, I frame it as:
- What problem pattern heap solves
- Min-heap and max-heap
- Top K problems
- K-way merge
- Streaming median
- Scheduling problems
- Heap with lazy deletion
- Heap vs sorting
- Common edge cases
- Senior-level explanation
2️⃣ What Problem Are We Solving?
A heap is used when we repeatedly need the smallest or largest item efficiently.
It is commonly used for:
- Top K frequent elements
- Kth largest element
- K closest points
- Merge k sorted lists
- Find median from data stream
- Meeting rooms
- Task scheduler variants
- Dijkstra’s algorithm
- Sliding window median
- Reorganize string
- IPO / project selection
- CPU task scheduling
The core idea is:
Maintain a dynamic set of candidates,
and repeatedly extract the best candidate.
👉 Interview Answer
A heap, or priority queue, is used when we need repeated access to the minimum or maximum element.
It supports inserting elements and removing the highest-priority element efficiently.
In Python,
heapqimplements a min-heap.For max-heap behavior, we usually store negative values.
3️⃣ Heap Basics
Min-Heap
In a min-heap:
smallest element is always at the top
In Python:
import heapq
Operations:
heapq.heappush(heap, value)
heapq.heappop(heap)
heap[0]
Time Complexity
| Operation | Complexity |
|---|---|
| Push | O(log n) |
| Pop | O(log n) |
| Peek min | O(1) |
| Heapify | O(n) |
👉 Interview Answer
A heap gives efficient access to the smallest element.
Push and pop are O(log n), while peeking at the top is O(1).
Building a heap from an existing list using heapify takes O(n).
4️⃣ Max-Heap in Python
Python Has Min-Heap
Python heapq is a min-heap by default.
To simulate a max-heap:
heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)
Example
import heapq
nums = [3, 1, 5, 2]
heap = []
for num in nums:
heapq.heappush(heap, -num)
print(-heapq.heappop(heap)) # 5
👉 Interview Answer
Python’s heapq is a min-heap.
If I need max-heap behavior, I store negative values.
Then the smallest negative number represents the largest original value.
5️⃣ Heap vs Sorting
Sorting
If we sort everything:
O(n log n)
Good when:
we need full ordering
Heap
If we only need top k:
O(n log k)
Good when:
k is much smaller than n
streaming input
dynamic candidates
repeated min/max extraction
Key Difference
| Need | Better Choice |
|---|---|
| Full sorted order | Sorting |
| Top K only | Heap |
| Dynamic candidates | Heap |
| Repeated min/max | Heap |
👉 Interview Answer
Sorting is good when we need the full order.
Heap is better when we only need the top k elements, or when candidates change dynamically.
For top k problems, maintaining a heap of size k gives O(n log k) time.
6️⃣ Top K Pattern
Problem Pattern
Find:
k largest
k smallest
k most frequent
k closest
Key Idea
Maintain a heap of size k.
For k largest:
use min-heap of size k
The heap stores the current k largest elements.
If a new element is larger than the smallest in heap, replace it.
Template
import heapq
def top_k_largest(nums, k):
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap)
return heap
👉 Interview Answer
For top k largest elements, I maintain a min-heap of size k.
The heap contains the best k candidates seen so far.
If the heap grows larger than k, I remove the smallest element.
At the end, the heap contains the k largest elements.
7️⃣ Kth Largest Element
Problem
Find the kth largest element in an array.
Heap Idea
Use a min-heap of size k.
After processing all numbers:
heap[0] is the kth largest
Code
import heapq
def find_kth_largest(nums, k):
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap)
return heap[0]
Complexity
Time: O(n log k)
Space: O(k)
👉 Interview Answer
For kth largest element, I keep a min-heap of size k.
The heap stores the k largest values seen so far.
The smallest value in that heap is the kth largest overall.
This gives O(n log k) time and O(k) space.
8️⃣ Top K Frequent Elements
Problem
Given nums, return the k most frequent elements.
Steps
1. Count frequency.
2. Push (frequency, number) into min-heap.
3. Keep heap size k.
4. Return elements in heap.
Code
from collections import Counter
import heapq
def top_k_frequent(nums, k):
freq = Counter(nums)
heap = []
for num, count in freq.items():
heapq.heappush(heap, (count, num))
if len(heap) > k:
heapq.heappop(heap)
return [num for count, num in heap]
Complexity
Time: O(n log k)
Space: O(n)
👉 Interview Answer
For Top K Frequent Elements, I first count frequencies.
Then I maintain a min-heap of size k ordered by frequency.
If the heap exceeds size k, I remove the element with the lowest frequency.
At the end, the heap contains the k most frequent elements.
9️⃣ K Closest Points to Origin
Problem
Given points on a 2D plane, return k points closest to the origin.
Distance:
x² + y²
No need to use square root.
Min-Heap Approach
Push all points by distance, then pop k times.
import heapq
def k_closest(points, k):
heap = []
for x, y in points:
dist = x * x + y * y
heapq.heappush(heap, (dist, x, y))
result = []
for _ in range(k):
dist, x, y = heapq.heappop(heap)
result.append([x, y])
return result
Size-k Max-Heap Approach
Better if k is small.
import heapq
def k_closest(points, k):
heap = []
for x, y in points:
dist = x * x + y * y
heapq.heappush(heap, (-dist, x, y))
if len(heap) > k:
heapq.heappop(heap)
return [[x, y] for dist, x, y in heap]
👉 Interview Answer
For K Closest Points, I compare squared distance to avoid square root.
If I push all points into a min-heap, I can pop k closest points.
If k is much smaller than n, I can maintain a size-k max-heap using negative distances.
🔟 Merge K Sorted Lists
Problem
Given k sorted linked lists, merge them into one sorted list.
Heap Idea
Put the head of each list into a min-heap.
Repeatedly pop the smallest node, append it to result, then push its next node.
Code
import heapq
def merge_k_lists(lists):
heap = []
for i, node in enumerate(lists):
if node:
heapq.heappush(heap, (node.val, i, node))
dummy = ListNode(0)
current = dummy
while heap:
value, i, node = heapq.heappop(heap)
current.next = node
current = current.next
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
Why Include i
If two nodes have the same value, Python may try to compare node objects directly.
Adding i prevents comparison errors.
👉 Interview Answer
For merging k sorted lists, I use a min-heap containing the current head of each list.
Each time I pop the smallest node, append it to the result, and push the next node from the same list.
The heap size is at most k, so each operation costs O(log k).
1️⃣1️⃣ K-Way Merge Pattern
Common Pattern
K-way merge appears when we have multiple sorted sources.
Examples:
merge k sorted lists
kth smallest element in sorted matrix
smallest range covering k lists
external merge sort
Template
import heapq
def k_way_merge(lists):
heap = []
for list_id, arr in enumerate(lists):
if arr:
heapq.heappush(heap, (arr[0], list_id, 0))
result = []
while heap:
value, list_id, index = heapq.heappop(heap)
result.append(value)
next_index = index + 1
if next_index < len(lists[list_id]):
next_value = lists[list_id][next_index]
heapq.heappush(heap, (next_value, list_id, next_index))
return result
👉 Interview Answer
K-way merge uses a heap to repeatedly choose the smallest current candidate among k sorted sources.
Each heap entry stores the value and enough information to know where the next value comes from.
This keeps the heap size at most k.
1️⃣2️⃣ Kth Smallest in Sorted Matrix
Problem
Given an n x n matrix where rows and columns are sorted, return the kth smallest element.
Heap Idea
Treat each row as a sorted list.
Use k-way merge.
Code
import heapq
def kth_smallest(matrix, k):
n = len(matrix)
heap = []
for r in range(min(n, k)):
heapq.heappush(heap, (matrix[r][0], r, 0))
for _ in range(k - 1):
value, r, c = heapq.heappop(heap)
if c + 1 < n:
heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
return heapq.heappop(heap)[0]
👉 Interview Answer
For kth smallest in a sorted matrix, I can treat each row as a sorted list.
I initialize the heap with the first element of each row.
Then I pop the smallest element k times, pushing the next element from the same row whenever possible.
1️⃣3️⃣ Find Median From Data Stream
Problem
Numbers arrive one by one. Return the median at any time.
Heap Idea
Use two heaps:
max-heap for smaller half
min-heap for larger half
In Python:
small = max-heap using negative values
large = min-heap
Rules
Maintain:
len(small) == len(large)
or
len(small) == len(large) + 1
All values in small should be <= all values in large.
Code
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
def addNum(self, num):
heapq.heappush(self.small, -num)
if self.large and -self.small[0] > self.large[0]:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.small) > len(self.large) + 1:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.large) > len(self.small):
value = heapq.heappop(self.large)
heapq.heappush(self.small, -value)
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
👉 Interview Answer
For streaming median, I use two heaps.
The max-heap stores the smaller half, and the min-heap stores the larger half.
I keep the heaps balanced so their sizes differ by at most one.
If one heap has more elements, its top is the median.
If both heaps have equal size, the median is the average of the two tops.
1️⃣4️⃣ Meeting Rooms II
Problem
Given meeting intervals, return the minimum number of meeting rooms required.
Heap Idea
Sort meetings by start time.
Use a min-heap of end times.
For each meeting:
if earliest ending room is free,
reuse it
otherwise allocate new room
Code
import heapq
def min_meeting_rooms(intervals):
if not intervals:
return 0
intervals.sort(key=lambda x: x[0])
heap = []
for start, end in intervals:
if heap and heap[0] <= start:
heapq.heappop(heap)
heapq.heappush(heap, end)
return len(heap)
👉 Interview Answer
For Meeting Rooms II, I sort meetings by start time.
I use a min-heap to track room end times.
If the earliest ending meeting has ended before the current meeting starts, I reuse that room.
Otherwise, I allocate a new room.
The maximum heap size represents the required room count.
1️⃣5️⃣ Task Scheduling With Heap
Problem Pattern
At each step, choose the highest-priority available task.
Examples:
most frequent task
earliest deadline
shortest processing time
highest profit
Heap Use
Use heap when:
available candidates change over time
priority changes which item should be processed next
Example: CPU Scheduling
sort tasks by enqueue time
push available tasks into heap by processing time
pop shortest task
👉 Interview Answer
Heap is useful in scheduling when the set of available tasks changes over time.
I usually sort events by time, push available tasks into a heap, and repeatedly process the best available task.
The heap represents the current candidate set.
1️⃣6️⃣ Reorganize String
Problem
Rearrange string so no two adjacent characters are the same.
Heap Idea
Always choose the most frequent available character, but hold the previously used character out for one round.
Code
from collections import Counter
import heapq
def reorganize_string(s):
freq = Counter(s)
heap = [(-count, ch) for ch, count in freq.items()]
heapq.heapify(heap)
result = []
prev_count = 0
prev_ch = ""
while heap:
count, ch = heapq.heappop(heap)
result.append(ch)
if prev_count < 0:
heapq.heappush(heap, (prev_count, prev_ch))
count += 1
prev_count = count
prev_ch = ch
answer = "".join(result)
if len(answer) != len(s):
return ""
return answer
👉 Interview Answer
Reorganize String uses a max-heap by frequency.
I always choose the character with the highest remaining count.
After using a character, I temporarily hold it back for one step, so it cannot be used twice in a row.
1️⃣7️⃣ Dijkstra’s Algorithm and Heap
Why Heap Is Used
Dijkstra repeatedly needs the unvisited node with the smallest known distance.
A min-heap provides this efficiently.
Pattern
heap entry = (distance, node)
Each time:
pop smallest distance
relax neighbors
push updated distances
Code Skeleton
import heapq
def dijkstra(graph, start):
dist = {node: float("inf") for node in graph}
dist[start] = 0
heap = [(0, start)]
while heap:
current_dist, node = heapq.heappop(heap)
if current_dist > dist[node]:
continue
for neighbor, weight in graph[node]:
new_dist = current_dist + weight
if new_dist < dist[neighbor]:
dist[neighbor] = new_dist
heapq.heappush(heap, (new_dist, neighbor))
return dist
👉 Interview Answer
Dijkstra uses a min-heap to always process the node with the smallest known distance.
When a shorter path to a neighbor is found, we update its distance and push it into the heap.
If we later pop an outdated distance, we skip it.
1️⃣8️⃣ Lazy Deletion
Problem
Python heap does not support efficient removal of arbitrary elements.
This matters for:
sliding window median
frequency updates
removing expired tasks
Lazy Deletion Idea
Instead of removing immediately:
mark item as deleted
remove it later when it reaches heap top
Pattern
while heap and should_delete(heap[0]):
heapq.heappop(heap)
👉 Interview Answer
Heap does not support efficient arbitrary deletion.
For problems that need deletion, I use lazy deletion.
I mark elements as invalid in a counter or set, and only remove them when they appear at the top of the heap.
This keeps heap operations efficient.
1️⃣9️⃣ Heap With Tuples
Python Tuple Ordering
Python heap compares tuples lexicographically.
Example:
(priority, value)
It first compares priority.
If priorities tie, it compares value.
Problem
If value is a custom object, Python may not know how to compare it.
Example:
(node.val, node)
If two nodes have same value, this can fail.
Fix
Add a unique tie-breaker:
(node.val, index, node)
👉 Interview Answer
Python heap compares tuple elements in order.
If the first values tie, it compares the next values.
When storing custom objects, I add a unique tie-breaker such as an index or counter, so Python never needs to compare the objects directly.
2️⃣0️⃣ Heap Complexity Analysis
Basic Heap
For n elements:
push: O(log n)
pop: O(log n)
peek: O(1)
heapify: O(n)
Top K
Time: O(n log k)
Space: O(k)
K-Way Merge
If total elements are N and k lists:
Time: O(N log k)
Space: O(k)
Streaming Median
Each insertion:
O(log n)
Find median:
O(1)
👉 Interview Answer
Heap operations are usually O(log n) for push and pop.
For top k problems, if we keep heap size k, the time is O(n log k).
For k-way merge, heap size is k, so total time is O(N log k).
For streaming median, insertion is O(log n), and median query is O(1).
2️⃣1️⃣ Common Edge Cases
Empty Input
nums = []
lists = []
intervals = []
k Is Zero
k = 0
Usually return empty result.
k Greater Than Input Size
Need decide whether:
return all elements
or
input is invalid
Duplicate Values
Heap handles duplicates, but tuple tie-breaking may matter.
Custom Objects
Need tie-breaker.
Negative Values
For max-heap simulation, be careful with negation.
👉 Interview Answer
Common heap edge cases include empty input, k equals zero, k larger than input size, duplicate priorities, custom objects requiring tie-breakers, and max-heap simulation with negative values.
I usually clarify what should happen when k is larger than the number of items.
2️⃣2️⃣ Common Mistakes
Mistake 1: Using Full Sorting for Dynamic Problems
Sorting once does not help if candidates change over time.
Mistake 2: Wrong Heap Size
For top k problems, heap should usually be capped at k.
Mistake 3: Wrong Sign for Max-Heap
For max-heap:
heappush(heap, -value)
and remember to negate when popping.
Mistake 4: No Tie-Breaker for Objects
Wrong:
(node.val, node)
Better:
(node.val, index, node)
Mistake 5: Forgetting Lazy Deletion
Trying to remove arbitrary elements directly from heap is inefficient.
Mistake 6: Returning Heap Without Ordering
A heap is not fully sorted.
If sorted output is required, sort the heap before returning.
👉 Interview Answer
Common heap mistakes include using sorting when candidates are dynamic, not maintaining the correct heap size, forgetting negative values for max-heap behavior, missing tuple tie-breakers, not using lazy deletion when deletion is needed, and assuming a heap is fully sorted.
2️⃣3️⃣ When Heap Applies
Good Fit
Use heap when:
need repeated min or max
need top k
need dynamic best candidate
need merge from multiple sorted sources
need streaming median
need scheduling by priority
need shortest path with changing distances
Problem Signals
Look for words like:
kth largest
top k
closest
smallest
largest
median stream
merge k sorted
priority
schedule
available task
minimum cost repeatedly
👉 Interview Answer
I consider heap when I need repeated access to the smallest or largest candidate, especially when the candidate set changes over time.
Common signals include top k, kth largest, merge k sorted lists, streaming median, scheduling by priority, and shortest-path problems like Dijkstra.
2️⃣4️⃣ When Heap Does Not Apply
Not Good Fit
Heap may not be ideal when:
need full sorted order once
need fast lookup by key
need arbitrary deletion without lazy deletion
need range query
need exact order among all elements
Better Alternatives
| Problem Type | Better Pattern |
|---|---|
| Full sorting | Sort |
| Fast membership lookup | Hash Set |
| Range query | Segment Tree / BIT |
| Ordered search | Balanced BST |
| Simple max/min once | Single pass |
| Connectivity | Union Find |
👉 Interview Answer
Heap is not always the right tool.
If I only need to sort all elements once, sorting may be simpler.
If I need fast lookup by key, a hash map is better.
If I need range queries, a segment tree or BIT is more appropriate.
If I need arbitrary deletion, I may need lazy deletion or a different data structure.
2️⃣5️⃣ Standard Templates
Min-Heap Template
import heapq
heap = []
heapq.heappush(heap, value)
smallest = heapq.heappop(heap)
top = heap[0]
Max-Heap Template
import heapq
heap = []
heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)
top = -heap[0]
Top K Template
import heapq
def top_k(nums, k):
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap)
return heap
K-Way Merge Template
import heapq
def k_way_merge(lists):
heap = []
for i, arr in enumerate(lists):
if arr:
heapq.heappush(heap, (arr[0], i, 0))
result = []
while heap:
value, list_id, index = heapq.heappop(heap)
result.append(value)
next_index = index + 1
if next_index < len(lists[list_id]):
heapq.heappush(
heap,
(lists[list_id][next_index], list_id, next_index)
)
return result
Two-Heaps Median Template
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
def add_num(self, num):
heapq.heappush(self.small, -num)
if self.large and -self.small[0] > self.large[0]:
heapq.heappush(self.large, -heapq.heappop(self.small))
if len(self.small) > len(self.large) + 1:
heapq.heappush(self.large, -heapq.heappop(self.small))
if len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def find_median(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
Lazy Deletion Template
import heapq
from collections import Counter
heap = []
deleted = Counter()
def prune():
while heap and deleted[heap[0]] > 0:
deleted[heap[0]] -= 1
heapq.heappop(heap)
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
A heap, or priority queue, is used when we need repeated access to the smallest or largest element.
It supports push and pop in O(log n), peek in O(1), and heapify in O(n).
In Python, heapq provides a min-heap. For max-heap behavior, I usually store negative values.
Heap is especially useful when the candidate set changes dynamically. Sorting gives a full order once, but heap allows us to repeatedly extract the current best candidate as new candidates are added.
For top k problems, I usually maintain a heap of size k. For kth largest, a min-heap of size k stores the k largest elements seen so far, and the heap top is the kth largest.
For k-way merge, I put the current smallest candidate from each sorted source into the heap. Each heap entry must also store enough information to find the next element from the same source. The heap size is usually k, so the total complexity is O(N log k).
For streaming median, I use two heaps: a max-heap for the smaller half, and a min-heap for the larger half. I maintain balance between the two heaps, so the median can be returned in O(1).
For scheduling problems, a heap is useful when available candidates change over time. We sort events by time, push available candidates into a heap, and repeatedly process the best available candidate.
In Python, tuple ordering is important. If heap entries contain custom objects, I add a unique tie-breaker like an index or counter to avoid comparison errors.
Heap does not support efficient arbitrary deletion. For problems that require deletion, such as sliding window median, I use lazy deletion: mark elements as removed, and physically pop them only when they reach the heap top.
At a senior level, I would explain: what priority means, whether we need min-heap or max-heap, why heap is better than sorting, what the heap size is, what each heap entry stores, how ties are handled, and whether lazy deletion is needed.
⭐ Final Insight
Heap 的核心不是“会用 heapq”。
它的核心是维护一个 dynamic candidate set, 并高效取出当前 priority 最高的元素。
Senior-level 的重点是:
priority 是什么, min-heap 还是 max-heap, heap size 是否固定为 k, heap entry 需要存哪些信息, 为什么不用 full sorting, tie-breaker 怎么处理, 以及是否需要 lazy deletion。
能讲清楚这些, Heap 就是一种处理 Top K、K-way merge、streaming median、scheduling 和 shortest path 的核心数据结构。
中文部分
🎯 Heap / Priority Queue
1️⃣ 核心框架
讨论 Heap / Priority Queue 时,我通常从:
- Heap 解决什么问题
- Min-heap 和 max-heap
- Top K problems
- K-way merge
- Streaming median
- Scheduling problems
- Heap with lazy deletion
- Heap vs sorting
- 常见 edge cases
- 高级工程师解释方式
2️⃣ 要解决什么问题?
Heap 用于需要反复获取最小或最大元素的场景。
它常用于:
- Top K frequent elements
- Kth largest element
- K closest points
- Merge k sorted lists
- Find median from data stream
- Meeting rooms
- Task scheduler variants
- Dijkstra’s algorithm
- Sliding window median
- Reorganize string
- IPO / project selection
- CPU task scheduling
核心思想是:
维护一个动态 candidate set,
每次高效取出 priority 最高的 candidate。
👉 面试回答
Heap,也叫 priority queue, 用于需要反复访问 minimum 或 maximum element 的场景。
它支持高效插入元素, 也支持高效弹出当前最高 priority 的元素。
Python 的 heapq 默认是 min-heap。
如果需要 max-heap, 通常存 negative values。
3️⃣ Heap Basics
Min-Heap
Min-heap 中:
smallest element is always at the top
Python 中使用:
import heapq
Operations:
heapq.heappush(heap, value)
heapq.heappop(heap)
heap[0]
Time Complexity
| Operation | Complexity |
|---|---|
| Push | O(log n) |
| Pop | O(log n) |
| Peek min | O(1) |
| Heapify | O(n) |
👉 面试回答
Heap 可以高效访问最小元素。
Push 和 pop 是 O(log n), peek top 是 O(1)。
对已有数组调用 heapify 建堆是 O(n)。
4️⃣ Python 中的 Max-Heap
Python 默认是 Min-Heap
Python heapq 默认是 min-heap。
如果要模拟 max-heap:
heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)
Example
import heapq
nums = [3, 1, 5, 2]
heap = []
for num in nums:
heapq.heappush(heap, -num)
print(-heapq.heappop(heap)) # 5
👉 面试回答
Python 的 heapq 是 min-heap。
如果需要 max-heap behavior, 我会存负数。
最小的负数对应原始值中最大的数字。
5️⃣ Heap vs Sorting
Sorting
如果把所有数据排序:
O(n log n)
适合:
需要完整排序结果
Heap
如果只需要 top k:
O(n log k)
适合:
k much smaller than n
streaming input
dynamic candidates
repeated min/max extraction
Key Difference
| Need | Better Choice |
|---|---|
| Full sorted order | Sorting |
| Top K only | Heap |
| Dynamic candidates | Heap |
| Repeated min/max | Heap |
👉 面试回答
Sorting 适合需要完整顺序的场景。
Heap 更适合只需要 top k, 或 candidate set 会动态变化的场景。
对 top k 问题, 维护 size k 的 heap 可以做到 O(n log k)。
6️⃣ Top K Pattern
Problem Pattern
找:
k largest
k smallest
k most frequent
k closest
Key Idea
维护一个 size k 的 heap。
对于 k largest:
use min-heap of size k
Heap 中保存当前见过的 k 个最大元素。
如果新元素大于 heap 中最小值, 就替换掉它。
Template
import heapq
def top_k_largest(nums, k):
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap)
return heap
👉 面试回答
对 top k largest elements, 我维护一个 size k 的 min-heap。
Heap 中始终保存当前最好的 k 个 candidates。
如果 heap size 超过 k, 就 pop 掉最小的那个。
最后 heap 中剩下的就是 k largest elements。
7️⃣ Kth Largest Element
Problem
找数组中的 kth largest element。
Heap Idea
使用 size k 的 min-heap。
处理完所有 numbers 后:
heap[0] is the kth largest
Code
import heapq
def find_kth_largest(nums, k):
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap)
return heap[0]
Complexity
Time: O(n log k)
Space: O(k)
👉 面试回答
对 kth largest element, 我维护一个 size k 的 min-heap。
Heap 中保存目前见过的 k 个最大值。
这个 heap 的最小值就是 overall kth largest。
时间复杂度是 O(n log k),space 是 O(k)。
8️⃣ Top K Frequent Elements
Problem
给定 nums, 返回出现频率最高的 k 个 elements。
Steps
1. Count frequency.
2. Push (frequency, number) into min-heap.
3. Keep heap size k.
4. Return elements in heap.
Code
from collections import Counter
import heapq
def top_k_frequent(nums, k):
freq = Counter(nums)
heap = []
for num, count in freq.items():
heapq.heappush(heap, (count, num))
if len(heap) > k:
heapq.heappop(heap)
return [num for count, num in heap]
Complexity
Time: O(n log k)
Space: O(n)
👉 面试回答
对 Top K Frequent Elements, 我先统计 frequency。
然后用 frequency 作为 priority, 维护 size k 的 min-heap。
如果 heap 超过 k, 就移除 frequency 最小的元素。
最后 heap 中就是 k 个最高频元素。
9️⃣ K Closest Points to Origin
Problem
给定 2D points, 返回距离 origin 最近的 k 个 points。
Distance:
x² + y²
不需要开平方。
Min-Heap Approach
把所有 points 放入 min-heap, 然后 pop k 次。
import heapq
def k_closest(points, k):
heap = []
for x, y in points:
dist = x * x + y * y
heapq.heappush(heap, (dist, x, y))
result = []
for _ in range(k):
dist, x, y = heapq.heappop(heap)
result.append([x, y])
return result
Size-k Max-Heap Approach
如果 k 很小,这个更好。
import heapq
def k_closest(points, k):
heap = []
for x, y in points:
dist = x * x + y * y
heapq.heappush(heap, (-dist, x, y))
if len(heap) > k:
heapq.heappop(heap)
return [[x, y] for dist, x, y in heap]
👉 面试回答
对 K Closest Points, 我比较 squared distance, 避免不必要的 square root。
如果把所有 points 放入 min-heap, 可以 pop k 次得到最近的 points。
如果 k 远小于 n, 可以使用 size-k max-heap, 用 negative distance 模拟 max-heap。
🔟 Merge K Sorted Lists
Problem
给定 k 个 sorted linked lists, 合并成一个 sorted list。
Heap Idea
把每个 list 的 head 放入 min-heap。
每次 pop 最小 node, append 到 result, 再把它的 next node 放入 heap。
Code
import heapq
def merge_k_lists(lists):
heap = []
for i, node in enumerate(lists):
if node:
heapq.heappush(heap, (node.val, i, node))
dummy = ListNode(0)
current = dummy
while heap:
value, i, node = heapq.heappop(heap)
current.next = node
current = current.next
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
为什么需要 i
如果两个 node 的 value 相同, Python 可能尝试直接比较 node object。
加上 i 可以避免 comparison error。
👉 面试回答
对 merge k sorted lists, 我使用 min-heap 保存每个 list 当前的 head。
每次取出最小 node, 加入结果, 然后把同一个 list 的 next node 放回 heap。
Heap size 最多是 k, 所以每次操作是 O(log k)。
1️⃣1️⃣ K-Way Merge Pattern
Common Pattern
K-way merge 出现在多个 sorted sources 的合并问题中。
Examples:
merge k sorted lists
kth smallest element in sorted matrix
smallest range covering k lists
external merge sort
Template
import heapq
def k_way_merge(lists):
heap = []
for list_id, arr in enumerate(lists):
if arr:
heapq.heappush(heap, (arr[0], list_id, 0))
result = []
while heap:
value, list_id, index = heapq.heappop(heap)
result.append(value)
next_index = index + 1
if next_index < len(lists[list_id]):
next_value = lists[list_id][next_index]
heapq.heappush(heap, (next_value, list_id, next_index))
return result
👉 面试回答
K-way merge 使用 heap, 在 k 个 sorted sources 中反复选择当前最小 candidate。
每个 heap entry 不只存 value, 还要存它来自哪个 source, 以及 source 中的 index。
这样 pop 后才能找到下一个 candidate。
1️⃣2️⃣ Kth Smallest in Sorted Matrix
Problem
给定 n x n matrix, 每一行和每一列都 sorted, 返回 kth smallest element。
Heap Idea
把每一行看成一个 sorted list。
使用 k-way merge。
Code
import heapq
def kth_smallest(matrix, k):
n = len(matrix)
heap = []
for r in range(min(n, k)):
heapq.heappush(heap, (matrix[r][0], r, 0))
for _ in range(k - 1):
value, r, c = heapq.heappop(heap)
if c + 1 < n:
heapq.heappush(heap, (matrix[r][c + 1], r, c + 1))
return heapq.heappop(heap)[0]
👉 面试回答
对 kth smallest in sorted matrix, 我可以把每一行看成一个 sorted list。
初始化 heap 时放入每一行的第一个元素。
然后 pop k 次最小元素, 每次从同一行推进到下一个元素。
1️⃣3️⃣ Find Median From Data Stream
Problem
数字一个一个到达。 需要随时返回 median。
Heap Idea
使用两个 heaps:
max-heap for smaller half
min-heap for larger half
Python 中:
small = max-heap using negative values
large = min-heap
Rules
维护:
len(small) == len(large)
or
len(small) == len(large) + 1
同时保证:
all values in small <= all values in large
Code
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
def addNum(self, num):
heapq.heappush(self.small, -num)
if self.large and -self.small[0] > self.large[0]:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.small) > len(self.large) + 1:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.large) > len(self.small):
value = heapq.heappop(self.large)
heapq.heappush(self.small, -value)
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
👉 面试回答
对 streaming median, 我使用两个 heaps。
Max-heap 保存较小的一半。
Min-heap 保存较大的一半。
我保持两个 heap 的 size 平衡, 并保证 small 中的所有元素都不大于 large 中的元素。
如果一个 heap 多一个元素, 它的 top 就是 median。
如果两个 heap 一样大, median 是两个 top 的平均值。
1️⃣4️⃣ Meeting Rooms II
Problem
给定 meeting intervals, 返回最少需要多少 meeting rooms。
Heap Idea
按 start time 排序。
用 min-heap 保存 room end times。
对于每个 meeting:
如果最早结束的 room 已经空出来,
就 reuse
否则 allocate new room
Code
import heapq
def min_meeting_rooms(intervals):
if not intervals:
return 0
intervals.sort(key=lambda x: x[0])
heap = []
for start, end in intervals:
if heap and heap[0] <= start:
heapq.heappop(heap)
heapq.heappush(heap, end)
return len(heap)
👉 面试回答
对 Meeting Rooms II, 我先按 start time 排序 meetings。
然后用 min-heap 记录当前 rooms 的 end times。
如果最早结束的 meeting 已经在当前 start 前结束, 可以复用这个 room。
否则需要新 room。
Heap 的最大 size 就是需要的 room 数量。
1️⃣5️⃣ Task Scheduling With Heap
Problem Pattern
每一步都要选择 highest-priority available task。
Examples:
most frequent task
earliest deadline
shortest processing time
highest profit
Heap Use
当满足这些条件时用 heap:
available candidates change over time
priority decides next task
Example: CPU Scheduling
sort tasks by enqueue time
push available tasks into heap by processing time
pop shortest task
👉 面试回答
在 scheduling 问题中, 如果 available tasks 会随时间变化, heap 很有用。
常见做法是先按 time 排序 events, 然后把当前 available tasks 放入 heap。
每次从 heap 中取出当前最优 task。
1️⃣6️⃣ Reorganize String
Problem
重新排列 string, 让相邻 characters 不相同。
Heap Idea
每次选择 remaining frequency 最高的 character, 但把刚刚用过的 character 暂停一轮。
Code
from collections import Counter
import heapq
def reorganize_string(s):
freq = Counter(s)
heap = [(-count, ch) for ch, count in freq.items()]
heapq.heapify(heap)
result = []
prev_count = 0
prev_ch = ""
while heap:
count, ch = heapq.heappop(heap)
result.append(ch)
if prev_count < 0:
heapq.heappush(heap, (prev_count, prev_ch))
count += 1
prev_count = count
prev_ch = ch
answer = "".join(result)
if len(answer) != len(s):
return ""
return answer
👉 面试回答
Reorganize String 使用 max-heap by frequency。
我每次选择剩余次数最多的 character。
使用后,暂时把它 hold back 一轮, 防止马上再次使用导致相邻重复。
1️⃣7️⃣ Dijkstra’s Algorithm and Heap
为什么 Dijkstra 使用 Heap?
Dijkstra 需要反复取出当前 known distance 最小的 unvisited node。
Min-heap 可以高效完成这个操作。
Pattern
heap entry = (distance, node)
每次:
pop smallest distance
relax neighbors
push updated distances
Code Skeleton
import heapq
def dijkstra(graph, start):
dist = {node: float("inf") for node in graph}
dist[start] = 0
heap = [(0, start)]
while heap:
current_dist, node = heapq.heappop(heap)
if current_dist > dist[node]:
continue
for neighbor, weight in graph[node]:
new_dist = current_dist + weight
if new_dist < dist[neighbor]:
dist[neighbor] = new_dist
heapq.heappush(heap, (new_dist, neighbor))
return dist
👉 面试回答
Dijkstra 使用 min-heap, 每次处理当前 known distance 最小的 node。
如果发现到 neighbor 的更短路径, 就更新 distance 并 push 到 heap。
如果之后 pop 出过期 distance, 直接 skip。
1️⃣8️⃣ Lazy Deletion
Problem
Python heap 不支持高效删除任意元素。
这在以下问题中很常见:
sliding window median
frequency updates
removing expired tasks
Lazy Deletion Idea
不要立刻从 heap 中删除。
而是:
mark item as deleted
等它到 heap top 时再真正 pop
Pattern
while heap and should_delete(heap[0]):
heapq.heappop(heap)
👉 面试回答
Heap 不支持高效 arbitrary deletion。
如果题目需要删除非 top 元素, 我会使用 lazy deletion。
先用 counter 或 set 标记元素已经 invalid。
等这个元素到达 heap top 时, 再真正 pop 掉它。
1️⃣9️⃣ Heap With Tuples
Python Tuple Ordering
Python heap 会按照 tuple 的字典序比较。
Example:
(priority, value)
它先比较 priority。
如果 priority 相同, 再比较 value。
Problem
如果 value 是 custom object, Python 可能不知道如何比较它。
Example:
(node.val, node)
如果两个 node value 相同, 可能报错。
Fix
添加 unique tie-breaker:
(node.val, index, node)
👉 面试回答
Python heap 会按 tuple 元素顺序比较。
如果第一个 priority 相同, 它会比较第二个元素。
当 heap entry 中有 custom object 时, 我会加入一个 unique tie-breaker, 比如 index 或 counter, 避免 Python 直接比较 object。
2️⃣0️⃣ Heap Complexity Analysis
Basic Heap
对于 n 个元素:
push: O(log n)
pop: O(log n)
peek: O(1)
heapify: O(n)
Top K
Time: O(n log k)
Space: O(k)
K-Way Merge
如果总元素数是 N, 有 k 个 lists:
Time: O(N log k)
Space: O(k)
Streaming Median
每次 insertion:
O(log n)
查询 median:
O(1)
👉 面试回答
Heap 的 push 和 pop 通常是 O(log n)。
对 top k 问题, 如果 heap size 保持为 k, 时间复杂度是 O(n log k)。
对 k-way merge, heap size 是 k, 所以总时间是 O(N log k)。
对 streaming median, 插入是 O(log n), 查询 median 是 O(1)。
2️⃣1️⃣ 常见 Edge Cases
Empty Input
nums = []
lists = []
intervals = []
k Is Zero
k = 0
通常返回 empty result。
k Greater Than Input Size
需要判断:
return all elements
or
input is invalid
Duplicate Values
Heap 可以处理 duplicates, 但 tuple tie-breaking 可能很重要。
Custom Objects
需要 tie-breaker。
Negative Values
用 negative values 模拟 max-heap 时要小心。
👉 面试回答
Heap 常见 edge cases 包括 empty input、 k 等于 0、 k 大于 input size、 duplicate priorities、 custom objects 需要 tie-breaker、 以及用 negative values 模拟 max-heap。
我通常会确认 k 超过元素数量时应该返回什么。
2️⃣2️⃣ 常见错误
Mistake 1: Dynamic Problem 中使用一次 Full Sorting
如果 candidates 会动态变化, 一次 sorting 不够。
Mistake 2: Heap Size 错误
Top k 问题中, heap 通常要保持 size k。
Mistake 3: Max-Heap 符号错误
Max-heap:
heappush(heap, -value)
pop 时要记得取负号。
Mistake 4: Object 没有 Tie-Breaker
错误:
(node.val, node)
更安全:
(node.val, index, node)
Mistake 5: 忘记 Lazy Deletion
直接从 heap 中删除任意元素通常不高效。
Mistake 6: 以为 Heap 是 Fully Sorted
Heap 只保证 top 是最小值。
如果要求 sorted output, 需要额外 sort。
👉 面试回答
Heap 常见错误包括: 对动态 candidate 问题使用一次性 sorting、 heap size 维护错误、 max-heap 负号处理错误、 custom object 没有 tie-breaker、 忘记 lazy deletion、 以及误以为 heap 内部是完整 sorted 的。
Heap 只保证 top element 是最小或最高 priority。
2️⃣3️⃣ 什么时候 Heap 适用?
Good Fit
当需要:
repeated min or max
top k
dynamic best candidate
merge from multiple sorted sources
streaming median
scheduling by priority
shortest path with changing distances
Problem Signals
看到这些词想到 heap:
kth largest
top k
closest
smallest
largest
median stream
merge k sorted
priority
schedule
available task
minimum cost repeatedly
👉 面试回答
当我需要反复访问 smallest 或 largest candidate, 尤其是 candidate set 会动态变化时, 我会考虑 heap。
常见信号包括 top k、kth largest、merge k sorted lists、 streaming median、priority scheduling, 以及 Dijkstra 这种 shortest-path 问题。
2️⃣4️⃣ 什么时候 Heap 不适用?
Not Good Fit
Heap 不适合:
need full sorted order once
need fast lookup by key
need arbitrary deletion without lazy deletion
need range query
need exact order among all elements
Better Alternatives
| Problem Type | Better Pattern |
|---|---|
| Full sorting | Sort |
| Fast membership lookup | Hash Set |
| Range query | Segment Tree / BIT |
| Ordered search | Balanced BST |
| Simple max/min once | Single pass |
| Connectivity | Union Find |
👉 面试回答
Heap 不是所有排序或 priority 问题的最佳工具。
如果只需要完整排序一次, sort 通常更简单。
如果需要按 key 快速查找, hash map 更合适。
如果需要 range query, segment tree 或 BIT 更合适。
如果需要 arbitrary deletion, 可能要 lazy deletion 或换数据结构。
2️⃣5️⃣ 标准模板
Min-Heap Template
import heapq
heap = []
heapq.heappush(heap, value)
smallest = heapq.heappop(heap)
top = heap[0]
Max-Heap Template
import heapq
heap = []
heapq.heappush(heap, -value)
largest = -heapq.heappop(heap)
top = -heap[0]
Top K Template
import heapq
def top_k(nums, k):
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap)
return heap
K-Way Merge Template
import heapq
def k_way_merge(lists):
heap = []
for i, arr in enumerate(lists):
if arr:
heapq.heappush(heap, (arr[0], i, 0))
result = []
while heap:
value, list_id, index = heapq.heappop(heap)
result.append(value)
next_index = index + 1
if next_index < len(lists[list_id]):
heapq.heappush(
heap,
(lists[list_id][next_index], list_id, next_index)
)
return result
Two-Heaps Median Template
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
def add_num(self, num):
heapq.heappush(self.small, -num)
if self.large and -self.small[0] > self.large[0]:
heapq.heappush(self.large, -heapq.heappop(self.small))
if len(self.small) > len(self.large) + 1:
heapq.heappush(self.large, -heapq.heappop(self.small))
if len(self.large) > len(self.small):
heapq.heappush(self.small, -heapq.heappop(self.large))
def find_median(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
Lazy Deletion Template
import heapq
from collections import Counter
heap = []
deleted = Counter()
def prune():
while heap and deleted[heap[0]] > 0:
deleted[heap[0]] -= 1
heapq.heappop(heap)
🧠 Staff-Level Answer Final
👉 面试回答完整版本
Heap,也叫 priority queue, 用于需要反复访问 smallest 或 largest element 的场景。
它支持 O(log n) 的 push 和 pop, O(1) 的 peek, 以及 O(n) 的 heapify。
Python 的 heapq 默认是 min-heap。 如果需要 max-heap behavior, 我通常用 negative values。
Heap 特别适合 candidate set 会动态变化的问题。 Sorting 可以一次性得到完整顺序, 但 heap 可以在不断加入新 candidates 的同时, 反复取出当前 best candidate。
对 top k 问题, 我通常维护一个 size k 的 heap。 对 kth largest, size k 的 min-heap 保存当前见过的 k 个最大元素, heap top 就是 kth largest。
对 k-way merge, 我会把每个 sorted source 当前最小的 candidate 放入 heap。 每个 heap entry 必须保存足够信息, 让我们知道 pop 之后应该从哪个 source 推进。 Heap size 通常是 k, 所以总复杂度是 O(N log k)。
对 streaming median, 我使用两个 heaps: 一个 max-heap 保存较小的一半, 一个 min-heap 保存较大的一半。 通过维持两个 heaps 的 size balance, median 可以 O(1) 查询。
对 scheduling problems, 如果 available candidates 会随时间变化, heap 很有用。 通常先按 time 排序 events, 再把 currently available candidates 放入 heap, 每次处理当前 priority 最好的 candidate。
在 Python 中, tuple ordering 很重要。 如果 heap entry 包含 custom object, 我会加入 unique tie-breaker, 比如 index 或 counter, 避免 Python 直接比较 object。
Heap 不支持高效 arbitrary deletion。 如果题目需要删除元素, 比如 sliding window median, 我会使用 lazy deletion: 先标记元素 invalid, 等它到 heap top 时再真正 pop。
高级工程师解释 heap 时, 我会讲清楚: priority 是什么, 需要 min-heap 还是 max-heap, 为什么 heap 比 sorting 更合适, heap size 是多少, 每个 heap entry 存什么, ties 如何处理, 以及是否需要 lazy deletion。
⭐ Final Insight
Heap 的核心不是“会用 heapq”。
它的核心是维护 dynamic candidate set, 并高效取出当前 priority 最高的元素。
Senior-level 的重点是:
priority 是什么, min-heap 还是 max-heap, heap size 是否固定为 k, heap entry 需要存哪些信息, 为什么不用 full sorting, tie-breaker 怎么处理, 以及是否需要 lazy deletion。
能讲清楚这些, Heap 就是一种处理 Top K、K-way merge、streaming median、scheduling 和 shortest path 的核心数据结构。
Implement