🎯 Design Data Structures
1️⃣ Core Framework
When discussing Design Data Structures, I frame it as:
- What data structure design problems are
- How to identify required operations
- Time complexity targets
- HashMap-based design
- Stack and queue design
- Heap-based design
- Linked list + HashMap design
- Trie-based design
- Randomized data structures
- Senior-level explanation
2️⃣ What Problem Are We Solving?
Design data structure problems ask us to implement a custom class that supports a set of operations efficiently.
They are commonly used for:
- LRU Cache
- LFU Cache
- Min Stack
- Max Stack
- Queue using stacks
- Stack using queues
- Design HashMap
- Design HashSet
- Insert Delete GetRandom O(1)
- Median Finder
- Time-Based Key-Value Store
- Trie
- Word Dictionary
- All O(1) Data Structure
- Hit Counter
- Moving Average
- Snapshot Array
The core idea is:
Data structure design is about matching operations
to the right combination of underlying structures.
👉 Interview Answer
Design data structure problems are about implementing a class with specific operations and complexity requirements.
I first list the required operations, then identify the time complexity target for each operation.
Most solutions combine multiple basic data structures, such as HashMap, linked list, heap, stack, queue, Trie, or balanced tree.
The key is to make each operation efficient while keeping internal state consistent.
3️⃣ How to Approach Design Problems
Step 1: List Operations
Example:
put(key, value)
get(key)
delete(key)
Step 2: Identify Complexity Requirements
The problem may ask for:
O(1)
O(log n)
O(k)
where k might be word length or number of returned items.
Step 3: Choose Data Structures
Ask:
Need fast lookup? -> HashMap
Need ordering? -> linked list / heap / tree
Need min or max? -> heap / monotonic structure
Need prefix search? -> Trie
Need random access? -> array
Need fast deletion? -> HashMap + index / linked list
Step 4: Maintain Invariants
Every design has internal rules.
Example:
LRU cache:
head = most recent
tail = least recent
map key -> linked list node
👉 Interview Answer
My design process is: first list the operations, then write the target complexity for each operation, then choose data structures that support those operations.
After that, I define invariants that must always remain true after every method call.
4️⃣ Common Building Blocks
HashMap
Use for:
O(1) key lookup
key to value
key to node
value to index
frequency to nodes
Array / List
Use for:
index access
random selection
compact storage
append and pop
Linked List
Use for:
O(1) remove node
O(1) move node
maintain order
LRU order
Heap
Use for:
min value
max value
top k
median
priority ordering
Trie
Use for:
prefix search
dictionary
word matching
autocomplete
👉 Interview Answer
Most design problems are solved by combining basic structures.
HashMap gives fast lookup. Array gives index access and random selection. Linked list gives O(1) reordering. Heap gives efficient min or max retrieval. Trie gives prefix search.
The trick is choosing the right combination.
5️⃣ Design HashSet
Problem
Implement:
add(key)
remove(key)
contains(key)
Simple Approach
Use buckets.
Each bucket stores values that hash to the same index.
Code
class MyHashSet:
def __init__(self):
self.size = 1000
self.buckets = [[] for _ in range(self.size)]
def _hash(self, key):
return key % self.size
def add(self, key):
index = self._hash(key)
if key not in self.buckets[index]:
self.buckets[index].append(key)
def remove(self, key):
index = self._hash(key)
if key in self.buckets[index]:
self.buckets[index].remove(key)
def contains(self, key):
index = self._hash(key)
return key in self.buckets[index]
👉 Interview Answer
A HashSet can be implemented with buckets.
I hash the key to find a bucket.
The bucket handles collisions by storing multiple values.
Add, remove, and contains are efficient on average, but collision handling matters.
6️⃣ Design HashMap
Problem
Implement:
put(key, value)
get(key)
remove(key)
Code
class MyHashMap:
def __init__(self):
self.size = 1000
self.buckets = [[] for _ in range(self.size)]
def _hash(self, key):
return key % self.size
def put(self, key, value):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
def get(self, key):
index = self._hash(key)
for k, v in self.buckets[index]:
if k == key:
return v
return -1
def remove(self, key):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
return
👉 Interview Answer
A HashMap stores key-value pairs.
I use a hash function to locate a bucket, then search within that bucket for the key.
Collisions are handled by chaining, where each bucket stores multiple key-value pairs.
7️⃣ Min Stack
Problem
Design a stack that supports:
push
pop
top
getMin
All in O(1).
Key Idea
Use two stacks:
main stack
min stack
The min stack keeps track of the minimum value at each level.
Code
class MinStack:
def __init__(self):
self.stack = []
self.min_stack = []
def push(self, val):
self.stack.append(val)
if not self.min_stack:
self.min_stack.append(val)
else:
self.min_stack.append(min(val, self.min_stack[-1]))
def pop(self):
self.stack.pop()
self.min_stack.pop()
def top(self):
return self.stack[-1]
def getMin(self):
return self.min_stack[-1]
👉 Interview Answer
Min Stack uses an auxiliary stack to track the minimum value at each stack depth.
Every time I push a value, I also push the current minimum.
When I pop, both stacks pop together.
This makes getMin O(1).
8️⃣ Queue Using Stacks
Problem
Implement queue operations using stacks:
push
pop
peek
empty
Key Idea
Use two stacks:
input stack
output stack
Push to input.
Pop from output.
If output is empty, move all input elements to output.
Code
class MyQueue:
def __init__(self):
self.input = []
self.output = []
def push(self, x):
self.input.append(x)
def _move(self):
if not self.output:
while self.input:
self.output.append(self.input.pop())
def pop(self):
self._move()
return self.output.pop()
def peek(self):
self._move()
return self.output[-1]
def empty(self):
return not self.input and not self.output
👉 Interview Answer
Queue using stacks uses two stacks.
New elements go into the input stack.
When we need to pop or peek, we move elements to the output stack only if output is empty.
This reverses order and gives FIFO behavior.
Each element moves at most twice, so operations are amortized O(1).
9️⃣ Stack Using Queues
Problem
Implement stack operations using queues:
push
pop
top
empty
One Queue Approach
When pushing a new element, rotate the queue so the newest element becomes front.
Code
from collections import deque
class MyStack:
def __init__(self):
self.queue = deque()
def push(self, x):
self.queue.append(x)
for _ in range(len(self.queue) - 1):
self.queue.append(self.queue.popleft())
def pop(self):
return self.queue.popleft()
def top(self):
return self.queue[0]
def empty(self):
return not self.queue
👉 Interview Answer
Stack using one queue works by rotating the queue after each push.
After inserting a new value, I move all previous elements behind it.
This makes the newest element appear at the front, so pop and top behave like a stack.
🔟 LRU Cache
Problem
Design a cache with:
get(key)
put(key, value)
Both in O(1).
When capacity is exceeded, remove the least recently used key.
Required Structures
Use:
HashMap: key -> node
Doubly linked list: usage order
Linked list order:
head = most recently used
tail = least recently used
👉 Interview Answer
LRU Cache requires O(1) lookup and O(1) update of usage order.
A HashMap gives O(1) access from key to node.
A doubly linked list gives O(1) removal and insertion.
On every get or put, I move the accessed node to the front.
When capacity is exceeded, I remove the node at the tail.
1️⃣1️⃣ LRU Cache Code
Code
class Node:
def __init__(self, key=0, value=0):
self.key = key
self.value = value
self.prev = None
self.next = None
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {}
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
prev_node = node.prev
next_node = node.next
prev_node.next = next_node
next_node.prev = prev_node
def _add_to_front(self, node):
first = self.head.next
node.prev = self.head
node.next = first
self.head.next = node
first.prev = node
def get(self, key):
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add_to_front(node)
return node.value
def put(self, key, value):
if key in self.cache:
node = self.cache[key]
node.value = value
self._remove(node)
self._add_to_front(node)
return
node = Node(key, value)
self.cache[key] = node
self._add_to_front(node)
if len(self.cache) > self.capacity:
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
👉 Interview Answer
In LRU Cache, the HashMap maps keys to linked-list nodes.
The linked list stores recency order.
Whenever a key is accessed, I remove its node from its current position and add it to the front.
If the cache exceeds capacity, I remove the node before the tail, which is the least recently used node.
1️⃣2️⃣ Insert Delete GetRandom O(1)
Problem
Support:
insert(val)
remove(val)
getRandom()
All average O(1).
Required Structures
Use:
array for random access
HashMap value -> index
Remove Trick
To remove a value in O(1):
swap it with the last element
pop last element
update moved element's index
Code
import random
class RandomizedSet:
def __init__(self):
self.values = []
self.index = {}
def insert(self, val):
if val in self.index:
return False
self.index[val] = len(self.values)
self.values.append(val)
return True
def remove(self, val):
if val not in self.index:
return False
remove_index = self.index[val]
last_value = self.values[-1]
self.values[remove_index] = last_value
self.index[last_value] = remove_index
self.values.pop()
del self.index[val]
return True
def getRandom(self):
return random.choice(self.values)
👉 Interview Answer
RandomizedSet combines an array and a HashMap.
The array allows O(1) random access.
The HashMap maps each value to its index.
To delete in O(1), I swap the element with the last element, pop the last element, and update the moved element’s index.
1️⃣3️⃣ Time-Based Key-Value Store
Problem
Support:
set(key, value, timestamp)
get(key, timestamp)
Return the value with the largest timestamp less than or equal to the given timestamp.
Required Structures
Use:
HashMap key -> list of (timestamp, value)
Because timestamps are increasing, each list is sorted.
Use binary search during get.
Code
from collections import defaultdict
import bisect
class TimeMap:
def __init__(self):
self.store = defaultdict(list)
def set(self, key, value, timestamp):
self.store[key].append((timestamp, value))
def get(self, key, timestamp):
arr = self.store[key]
index = bisect.bisect_right(arr, (timestamp, chr(255)))
if index == 0:
return ""
return arr[index - 1][1]
👉 Interview Answer
TimeMap uses a HashMap from key to a sorted list of timestamp-value pairs.
Since timestamps are inserted in increasing order, set is O(1).
For get, I binary search for the largest timestamp less than or equal to the query timestamp.
That gives O(log n) lookup per key.
1️⃣4️⃣ Median Finder
Problem
Design a data structure that supports:
addNum(num)
findMedian()
Required Structures
Use two heaps:
max heap for smaller half
min heap for larger half
In Python, max heap is simulated using negative values.
Invariant
Maintain:
len(small) == len(large)
or
len(small) == len(large) + 1
And:
all values in small <= all values in large
Code
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
def addNum(self, num):
heapq.heappush(self.small, -num)
if self.large and -self.small[0] > self.large[0]:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.small) > len(self.large) + 1:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.large) > len(self.small):
value = heapq.heappop(self.large)
heapq.heappush(self.small, -value)
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
👉 Interview Answer
Median Finder uses two heaps.
The max heap stores the smaller half, and the min heap stores the larger half.
I rebalance so their sizes differ by at most one.
If the total count is odd, the median is the top of the larger side.
If even, it is the average of both heap tops.
1️⃣5️⃣ Design Trie
Problem
Support:
insert(word)
search(word)
startsWith(prefix)
Required Structure
Use Trie nodes.
Each node stores:
children map
is_word flag
Code
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_word = True
def search(self, word):
node = self._find_node(word)
return node is not None and node.is_word
def startsWith(self, prefix):
return self._find_node(prefix) is not None
def _find_node(self, s):
node = self.root
for ch in s:
if ch not in node.children:
return None
node = node.children[ch]
return node
👉 Interview Answer
Trie is designed for prefix-based string operations.
Each node represents a prefix.
Insert walks through characters and creates missing nodes.
Search returns true only if the final node marks a complete word.
startsWith only needs the prefix path to exist.
1️⃣6️⃣ Word Dictionary
Problem
Support:
addWord(word)
search(word)
Search may contain wildcard:
.
where . matches any character.
Required Structure
Use Trie + DFS.
When character is .:
try all children
Code
class WordDictionary:
def __init__(self):
self.root = TrieNode()
def addWord(self, word):
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_word = True
def search(self, word):
def dfs(index, node):
if index == len(word):
return node.is_word
ch = word[index]
if ch == ".":
for child in node.children.values():
if dfs(index + 1, child):
return True
return False
if ch not in node.children:
return False
return dfs(index + 1, node.children[ch])
return dfs(0, self.root)
👉 Interview Answer
WordDictionary extends Trie with wildcard search.
Normal characters follow one child path.
When the character is
., I try all possible children using DFS.The final result is true only if a full word ends at the final position.
1️⃣7️⃣ Hit Counter
Problem
Count hits in the past 5 minutes.
Support:
hit(timestamp)
getHits(timestamp)
Assume 5 minutes means:
300 seconds
Queue Approach
Store timestamps in a queue.
Remove timestamps older than:
timestamp - 300
Code
from collections import deque
class HitCounter:
def __init__(self):
self.queue = deque()
def hit(self, timestamp):
self.queue.append(timestamp)
def getHits(self, timestamp):
while self.queue and self.queue[0] <= timestamp - 300:
self.queue.popleft()
return len(self.queue)
👉 Interview Answer
Hit Counter can be implemented with a queue.
Each hit timestamp is appended.
When querying, I remove timestamps outside the last 300 seconds.
The queue then contains only valid hits.
1️⃣8️⃣ Moving Average From Data Stream
Problem
Maintain average of the last size values.
Required Structures
Use:
queue
running sum
Code
from collections import deque
class MovingAverage:
def __init__(self, size):
self.size = size
self.queue = deque()
self.total = 0
def next(self, val):
self.queue.append(val)
self.total += val
if len(self.queue) > self.size:
self.total -= self.queue.popleft()
return self.total / len(self.queue)
👉 Interview Answer
Moving Average uses a queue and a running sum.
The queue stores the current window.
When the window exceeds the target size, I remove the oldest value and subtract it from the sum.
This gives O(1) update time.
1️⃣9️⃣ Snapshot Array
Problem
Support:
set(index, val)
snap()
get(index, snap_id)
Required Structure
For each index, store history:
list of (snap_id, value)
Use binary search for get.
Code
import bisect
class SnapshotArray:
def __init__(self, length):
self.snap_id = 0
self.history = [[(0, 0)] for _ in range(length)]
def set(self, index, val):
arr = self.history[index]
if arr[-1][0] == self.snap_id:
arr[-1] = (self.snap_id, val)
else:
arr.append((self.snap_id, val))
def snap(self):
self.snap_id += 1
return self.snap_id - 1
def get(self, index, snap_id):
arr = self.history[index]
i = bisect.bisect_right(arr, (snap_id, float("inf")))
return arr[i - 1][1]
👉 Interview Answer
SnapshotArray stores version history per index.
Instead of copying the whole array on every snapshot, I only store changes.
For get, I binary search the latest value whose snap id is less than or equal to the requested snap id.
This makes snapshots cheap and lookup logarithmic.
2️⃣0️⃣ LFU Cache
Problem
Design cache with:
get(key)
put(key, value)
When capacity is exceeded, evict the least frequently used key.
If there is a tie, evict the least recently used among them.
Required Structures
Use:
key -> node
frequency -> ordered dictionary of keys
min_frequency
In Python,
OrderedDict helps maintain recency within each frequency.
Code
from collections import defaultdict, OrderedDict
class LFUCache:
def __init__(self, capacity):
self.capacity = capacity
self.key_to_val_freq = {}
self.freq_to_keys = defaultdict(OrderedDict)
self.min_freq = 0
def _update_freq(self, key):
value, freq = self.key_to_val_freq[key]
del self.freq_to_keys[freq][key]
if not self.freq_to_keys[freq]:
del self.freq_to_keys[freq]
if self.min_freq == freq:
self.min_freq += 1
new_freq = freq + 1
self.freq_to_keys[new_freq][key] = None
self.key_to_val_freq[key] = (value, new_freq)
def get(self, key):
if key not in self.key_to_val_freq:
return -1
self._update_freq(key)
return self.key_to_val_freq[key][0]
def put(self, key, value):
if self.capacity == 0:
return
if key in self.key_to_val_freq:
self.key_to_val_freq[key] = (value, self.key_to_val_freq[key][1])
self._update_freq(key)
return
if len(self.key_to_val_freq) >= self.capacity:
evict_key, _ = self.freq_to_keys[self.min_freq].popitem(last=False)
del self.key_to_val_freq[evict_key]
self.key_to_val_freq[key] = (value, 1)
self.freq_to_keys[1][key] = None
self.min_freq = 1
👉 Interview Answer
LFU Cache needs to track both frequency and recency.
I keep a map from key to value and frequency.
I also keep a map from frequency to an ordered set of keys.
min_freqtells me which frequency bucket to evict from.Within that bucket, I evict the least recently used key.
2️⃣1️⃣ All O(1) Data Structure
Problem
Support:
inc(key)
dec(key)
getMaxKey()
getMinKey()
All in O(1).
High-Level Idea
Use buckets ordered by count.
Each bucket stores keys with the same count.
Need:
key -> bucket
doubly linked list of count buckets
Conceptual Structure
head <-> count 1 <-> count 2 <-> count 5 <-> tail
Each bucket has:
count
set of keys
prev
next
👉 Interview Answer
The All O(1) Data Structure is built from a doubly linked list of count buckets.
Each bucket stores all keys with the same count.
A HashMap maps each key to its current bucket.
Increment and decrement move a key to an adjacent count bucket.
The head side gives the minimum key, and the tail side gives the maximum key.
2️⃣2️⃣ Design Patterns by Requirement
Requirement Mapping
| Requirement | Common Structure |
|---|---|
| O(1) lookup by key | HashMap |
| O(1) random element | Array |
| O(1) remove arbitrary element | HashMap + array swap |
| O(1) reorder by recency | HashMap + doubly linked list |
| Get min/max quickly | Heap |
| Maintain median | Two heaps |
| Prefix search | Trie |
| Timestamp query | HashMap + sorted list + binary search |
| Version history | List of changes + binary search |
| Sliding window count | Queue + running state |
👉 Interview Answer
I map requirements to structures.
O(1) lookup usually means HashMap.
Random selection usually means array.
O(1) deletion from array requires index mapping and swap-with-last.
Recency ordering usually needs a doubly linked list.
Median requires two heaps.
Prefix search requires Trie.
Timestamp queries usually require sorted history and binary search.
2️⃣3️⃣ Complexity Analysis
Always State Complexity Per Operation
For design problems, do not only state total complexity.
State per method:
get: O(1)
put: O(1)
remove: O(1)
findMedian: O(1)
addNum: O(log n)
Also State Space
Usually:
Space: O(n)
where n is number of stored items.
For Trie:
Space: O(total characters)
For SnapshotArray:
Space: O(number of set calls)
👉 Interview Answer
For data structure design, I analyze complexity per operation.
Each public method should have a clear time complexity.
I also describe space complexity based on the number of stored elements, total characters, or total versions depending on the design.
2️⃣4️⃣ Common Edge Cases
Empty Structure
Examples:
pop from empty stack
get from empty cache
getRandom from empty set
Usually problem guarantees valid calls, but check if not guaranteed.
Existing Key
For cache or map:
put existing key should update value
Capacity Zero
Important for cache problems.
if capacity == 0:
return
Duplicate Values
Need decide:
set disallows duplicates
multiset allows duplicates
Timestamp Boundaries
For TimeMap:
query before first timestamp
query exactly at timestamp
query after latest timestamp
Recency Updates
For LRU:
get also updates recency
put existing key also updates recency
👉 Interview Answer
Common edge cases include empty structures, updating existing keys, zero capacity, duplicate values, timestamp boundary queries, and whether reads update recency.
In design problems, edge cases are often about keeping internal state consistent after every operation.
2️⃣5️⃣ Common Mistakes
Mistake 1: Using Only One Data Structure
Many design problems require a combination.
Example:
LRU needs both HashMap and linked list.
Mistake 2: Forgetting to Update Both Structures
If using HashMap + array, every insert/delete must update both.
Mistake 3: Not Handling Existing Keys
For cache:
put existing key should update value and recency/frequency.
Mistake 4: Wrong Eviction Rule
LRU evicts least recently used.
LFU evicts least frequently used, then least recently used among ties.
Mistake 5: Not Maintaining Invariants
Example:
MedianFinder heaps must stay balanced.
Mistake 6: Ignoring Amortized Complexity
Queue using stacks has amortized O(1), not worst-case O(1) per pop.
👉 Interview Answer
Common mistakes include trying to solve the problem with only one structure, forgetting to update all internal structures, mishandling existing keys, applying the wrong eviction rule, breaking invariants, and confusing amortized complexity with worst-case complexity.
2️⃣6️⃣ When Design Data Structures Applies
Good Fit
This pattern applies when the problem asks:
Design a class
Implement operations
Support get/put/add/remove
O(1) operations
O(log n) operations
Data stream operations
Cache behavior
Versioned array
Randomized structure
Prefix dictionary
Problem Signals
Look for words like:
design
implement
class
constructor
method
support
data structure
cache
stream
random
timestamp
snapshot
prefix
👉 Interview Answer
I recognize design data structure problems when the prompt asks me to implement a class with several operations.
The key is not just making each operation correct, but meeting the required time complexity.
I usually solve these by combining basic data structures and carefully maintaining invariants.
2️⃣7️⃣ When Design Data Structures Does Not Apply
Not Good Fit
This pattern may not be enough when:
problem asks for one-time computation
problem is mainly graph traversal
problem is pure dynamic programming
problem asks for sorting or searching only
problem has no persistent state
Better Alternatives
| Problem Type | Better Pattern |
|---|---|
| One-time array scan | Two Pointers / Sliding Window |
| Shortest path | BFS / Dijkstra |
| Dependency ordering | Topological Sort |
| Optimization over states | Dynamic Programming |
| Range query with updates | Segment Tree / Fenwick Tree |
| String matching | KMP / Rolling Hash |
| Connectivity | Union Find |
👉 Interview Answer
Data structure design is most useful when the problem has persistent state and multiple operations.
If the problem is a one-time computation, then a standard algorithmic pattern like sliding window, DP, graph traversal, or sorting may be more appropriate.
2️⃣8️⃣ Standard Templates
HashMap + Array Template
class Structure:
def __init__(self):
self.arr = []
self.index = {}
def add(self, val):
if val in self.index:
return False
self.index[val] = len(self.arr)
self.arr.append(val)
return True
def remove(self, val):
if val not in self.index:
return False
i = self.index[val]
last = self.arr[-1]
self.arr[i] = last
self.index[last] = i
self.arr.pop()
del self.index[val]
return True
Doubly Linked List Node Template
class Node:
def __init__(self, key=0, value=0):
self.key = key
self.value = value
self.prev = None
self.next = None
LRU Helper Template
def remove(node):
node.prev.next = node.next
node.next.prev = node.prev
def add_to_front(node):
first = head.next
node.prev = head
node.next = first
head.next = node
first.prev = node
Two Heaps Template
import heapq
small = [] # max heap using negative values
large = [] # min heap
def add(num):
heapq.heappush(small, -num)
if large and -small[0] > large[0]:
heapq.heappush(large, -heapq.heappop(small))
if len(small) > len(large) + 1:
heapq.heappush(large, -heapq.heappop(small))
if len(large) > len(small):
heapq.heappush(small, -heapq.heappop(large))
Timestamp Binary Search Template
import bisect
def get_value(history, timestamp):
i = bisect.bisect_right(history, (timestamp, chr(255)))
if i == 0:
return ""
return history[i - 1][1]
Trie Template
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
Design data structure problems ask us to implement a class that supports a set of operations efficiently.
I approach these problems by first listing every operation and its required time complexity. Then I map each requirement to an underlying data structure.
If I need O(1) lookup by key, I usually need a HashMap. If I need O(1) random access, I usually need an array. If I need O(1) deletion from an array, I combine the array with a HashMap from value to index and use the swap-with-last trick.
If I need recency ordering, such as in LRU Cache, I combine a HashMap with a doubly linked list. The HashMap gives O(1) access to nodes, and the linked list allows O(1) remove and move-to-front.
If I need median in a stream, I use two heaps: a max heap for the smaller half and a min heap for the larger half. I keep the heaps balanced so the median is always available from the heap tops.
If I need timestamp-based lookup, I use a HashMap from key to a sorted list of timestamp-value pairs, then use binary search during query.
If I need prefix search, I use a Trie. Each Trie node represents a prefix, and the path through the Trie represents a word or prefix.
For cache problems, the eviction rule is critical. LRU evicts the least recently used item. LFU evicts the least frequently used item, and if there is a tie, it evicts the least recently used item among that frequency.
The most important part of data structure design is maintaining invariants. For example, in LRU, the linked list order must always match recency. In MedianFinder, the two heaps must remain balanced. In RandomizedSet, the array and index map must always agree.
At a senior level, I would explain: what operations are required, what complexity each operation needs, what internal data structures are used, what invariant each structure maintains, how each method updates all internal state, and which edge cases can break the design.
⭐ Final Insight
Design Data Structures 的核心不是“会用 HashMap”。
它的核心是把 operation requirements 映射到 data structure combination。
Senior-level 的重点是:
每个 method 的复杂度是多少, 为什么单个 structure 不够, HashMap 负责什么, array / heap / linked list / Trie 负责什么, 每次操作后 invariant 是否仍然成立, existing key、capacity zero、duplicate values、timestamp boundary 怎么处理, 以及 amortized O(1) 和 worst-case O(1) 的区别。
能讲清楚这些, Design Data Structures 就是一类处理 custom class、multi-operation state、cache、stream、random access 和 prefix lookup 的核心模式。
中文部分
🎯 Design Data Structures
1️⃣ 核心框架
讨论 Design Data Structures 时,我通常从:
- Data structure design problems 是什么
- 如何识别 required operations
- Time complexity targets
- HashMap-based design
- Stack and queue design
- Heap-based design
- Linked list + HashMap design
- Trie-based design
- Randomized data structures
- 高级工程师解释方式
2️⃣ 要解决什么问题?
Design data structure problems 要求我们实现一个 custom class, 并且高效支持一组 operations。
它常用于:
- LRU Cache
- LFU Cache
- Min Stack
- Max Stack
- Queue using stacks
- Stack using queues
- Design HashMap
- Design HashSet
- Insert Delete GetRandom O(1)
- Median Finder
- Time-Based Key-Value Store
- Trie
- Word Dictionary
- All O(1) Data Structure
- Hit Counter
- Moving Average
- Snapshot Array
核心思想是:
Data structure design 是把 operations
匹配到合适的底层 data structures 组合。
👉 面试回答
Design data structure problems 是实现一个 class, 让它支持特定 operations 和复杂度要求。
我会先列出所有 required operations, 然后明确每个 operation 的 time complexity target。
大多数解法会组合多个基础结构, 例如 HashMap、linked list、heap、stack、queue、Trie 或 balanced tree。
关键是让每个 operation 高效, 同时保证内部状态一致。
3️⃣ How to Approach Design Problems
Step 1: List Operations
Example:
put(key, value)
get(key)
delete(key)
Step 2: Identify Complexity Requirements
题目可能要求:
O(1)
O(log n)
O(k)
其中 k 可能是 word length 或返回元素数量。
Step 3: Choose Data Structures
问自己:
Need fast lookup? -> HashMap
Need ordering? -> linked list / heap / tree
Need min or max? -> heap / monotonic structure
Need prefix search? -> Trie
Need random access? -> array
Need fast deletion? -> HashMap + index / linked list
Step 4: Maintain Invariants
每个设计都有内部规则。
Example:
LRU cache:
head = most recent
tail = least recent
map key -> linked list node
👉 面试回答
我的设计流程是: 先列出 operations, 然后写出每个 operation 的目标复杂度, 再选择能支持这些 operations 的 data structures。
最后我会定义 invariants, 并确保每次 method call 后这些规则仍然成立。
4️⃣ Common Building Blocks
HashMap
用于:
O(1) key lookup
key to value
key to node
value to index
frequency to nodes
Array / List
用于:
index access
random selection
compact storage
append and pop
Linked List
用于:
O(1) remove node
O(1) move node
maintain order
LRU order
Heap
用于:
min value
max value
top k
median
priority ordering
Trie
用于:
prefix search
dictionary
word matching
autocomplete
👉 面试回答
大多数 design problems 都是基础结构的组合。
HashMap 提供快速 lookup。 Array 提供 index access 和 random selection。 Linked list 提供 O(1) reordering。 Heap 提供高效 min 或 max retrieval。 Trie 提供 prefix search。
难点在于选择正确组合。
5️⃣ Design HashSet
Problem
实现:
add(key)
remove(key)
contains(key)
Simple Approach
使用 buckets。
每个 bucket 存储 hash 到同一个 index 的 values。
Code
class MyHashSet:
def __init__(self):
self.size = 1000
self.buckets = [[] for _ in range(self.size)]
def _hash(self, key):
return key % self.size
def add(self, key):
index = self._hash(key)
if key not in self.buckets[index]:
self.buckets[index].append(key)
def remove(self, key):
index = self._hash(key)
if key in self.buckets[index]:
self.buckets[index].remove(key)
def contains(self, key):
index = self._hash(key)
return key in self.buckets[index]
👉 面试回答
HashSet 可以用 buckets 实现。
我用 hash function 找到 key 对应的 bucket。
Bucket 通过存多个 values 处理 collisions。
Add、remove 和 contains 平均情况下比较高效, 但 collision handling 很重要。
6️⃣ Design HashMap
Problem
实现:
put(key, value)
get(key)
remove(key)
Code
class MyHashMap:
def __init__(self):
self.size = 1000
self.buckets = [[] for _ in range(self.size)]
def _hash(self, key):
return key % self.size
def put(self, key, value):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
def get(self, key):
index = self._hash(key)
for k, v in self.buckets[index]:
if k == key:
return v
return -1
def remove(self, key):
index = self._hash(key)
bucket = self.buckets[index]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
return
👉 面试回答
HashMap 存储 key-value pairs。
我用 hash function 找到 bucket, 然后在 bucket 中查找 key。
Collision 通过 chaining 处理, 也就是每个 bucket 可以存多个 key-value pairs。
7️⃣ Min Stack
Problem
设计一个 stack 支持:
push
pop
top
getMin
全部 O(1)。
Key Idea
使用两个 stacks:
main stack
min stack
Min stack 记录每一层的当前 minimum。
Code
class MinStack:
def __init__(self):
self.stack = []
self.min_stack = []
def push(self, val):
self.stack.append(val)
if not self.min_stack:
self.min_stack.append(val)
else:
self.min_stack.append(min(val, self.min_stack[-1]))
def pop(self):
self.stack.pop()
self.min_stack.pop()
def top(self):
return self.stack[-1]
def getMin(self):
return self.min_stack[-1]
👉 面试回答
Min Stack 使用 auxiliary stack 记录每个 stack depth 的 minimum。
每 push 一个 value, 同时 push 当前 minimum。
Pop 时两个 stacks 一起 pop。
这样 getMin 就是 O(1)。
8️⃣ Queue Using Stacks
Problem
使用 stacks 实现 queue operations:
push
pop
peek
empty
Key Idea
使用两个 stacks:
input stack
output stack
Push 到 input。
Pop 从 output。
如果 output 为空, 把 input 全部倒入 output。
Code
class MyQueue:
def __init__(self):
self.input = []
self.output = []
def push(self, x):
self.input.append(x)
def _move(self):
if not self.output:
while self.input:
self.output.append(self.input.pop())
def pop(self):
self._move()
return self.output.pop()
def peek(self):
self._move()
return self.output[-1]
def empty(self):
return not self.input and not self.output
👉 面试回答
Queue using stacks 使用两个 stacks。
新元素进入 input stack。
当需要 pop 或 peek 时, 如果 output stack 为空, 就把 input 中的元素全部移动到 output。
这样会反转顺序, 实现 FIFO 行为。
每个元素最多移动两次, 所以 operations 是 amortized O(1)。
9️⃣ Stack Using Queues
Problem
使用 queues 实现 stack operations:
push
pop
top
empty
One Queue Approach
Push 新元素后, 旋转 queue, 让 newest element 到 front。
Code
from collections import deque
class MyStack:
def __init__(self):
self.queue = deque()
def push(self, x):
self.queue.append(x)
for _ in range(len(self.queue) - 1):
self.queue.append(self.queue.popleft())
def pop(self):
return self.queue.popleft()
def top(self):
return self.queue[0]
def empty(self):
return not self.queue
👉 面试回答
Stack using one queue 的核心是在 push 后旋转 queue。
插入新值后, 把之前的元素移动到它后面。
这样 newest element 会出现在 front, pop 和 top 就表现得像 stack。
🔟 LRU Cache
Problem
设计 cache 支持:
get(key)
put(key, value)
都要求 O(1)。
当超过 capacity 时, 移除 least recently used key。
Required Structures
使用:
HashMap: key -> node
Doubly linked list: usage order
Linked list order:
head = most recently used
tail = least recently used
👉 面试回答
LRU Cache 需要 O(1) lookup, 也需要 O(1) 更新 usage order。
HashMap 可以从 key O(1) 找到 node。
Doubly linked list 可以 O(1) remove 和 insert。
每次 get 或 put, 都把访问过的 node 移到 front。
超过 capacity 时, 删除 tail 位置的 node。
1️⃣1️⃣ LRU Cache Code
Code
class Node:
def __init__(self, key=0, value=0):
self.key = key
self.value = value
self.prev = None
self.next = None
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {}
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
prev_node = node.prev
next_node = node.next
prev_node.next = next_node
next_node.prev = prev_node
def _add_to_front(self, node):
first = self.head.next
node.prev = self.head
node.next = first
self.head.next = node
first.prev = node
def get(self, key):
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add_to_front(node)
return node.value
def put(self, key, value):
if key in self.cache:
node = self.cache[key]
node.value = value
self._remove(node)
self._add_to_front(node)
return
node = Node(key, value)
self.cache[key] = node
self._add_to_front(node)
if len(self.cache) > self.capacity:
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
👉 面试回答
在 LRU Cache 中, HashMap 把 key 映射到 linked-list node。
Linked list 保存 recency order。
每次 key 被访问, 我把它从当前位置移除, 再加入 front。
如果 cache 超过 capacity, 删除 tail 前面的 node, 也就是 least recently used node。
1️⃣2️⃣ Insert Delete GetRandom O(1)
Problem
支持:
insert(val)
remove(val)
getRandom()
全部 average O(1)。
Required Structures
使用:
array for random access
HashMap value -> index
Remove Trick
为了 O(1) 删除 value:
和最后一个 element 交换
pop 最后一个 element
更新被移动 element 的 index
Code
import random
class RandomizedSet:
def __init__(self):
self.values = []
self.index = {}
def insert(self, val):
if val in self.index:
return False
self.index[val] = len(self.values)
self.values.append(val)
return True
def remove(self, val):
if val not in self.index:
return False
remove_index = self.index[val]
last_value = self.values[-1]
self.values[remove_index] = last_value
self.index[last_value] = remove_index
self.values.pop()
del self.index[val]
return True
def getRandom(self):
return random.choice(self.values)
👉 面试回答
RandomizedSet 组合 array 和 HashMap。
Array 支持 O(1) random access。
HashMap 把每个 value 映射到它的 index。
删除时, 我把要删除的元素和最后一个元素交换, 然后 pop 最后一个, 并更新被移动元素的 index。
1️⃣3️⃣ Time-Based Key-Value Store
Problem
支持:
set(key, value, timestamp)
get(key, timestamp)
返回 timestamp 小于等于 query timestamp 的最大 timestamp 对应的 value。
Required Structures
使用:
HashMap key -> list of (timestamp, value)
因为 timestamps 是递增插入, 每个 list 是 sorted。
Get 时使用 binary search。
Code
from collections import defaultdict
import bisect
class TimeMap:
def __init__(self):
self.store = defaultdict(list)
def set(self, key, value, timestamp):
self.store[key].append((timestamp, value))
def get(self, key, timestamp):
arr = self.store[key]
index = bisect.bisect_right(arr, (timestamp, chr(255)))
if index == 0:
return ""
return arr[index - 1][1]
👉 面试回答
TimeMap 使用 HashMap, 每个 key 对应一个 sorted list of timestamp-value pairs。
因为 timestamps 按递增顺序插入, set 是 O(1)。
get 时, 我 binary search 找到小于等于 query timestamp 的最大 timestamp。
所以每个 key 的查询是 O(log n)。
1️⃣4️⃣ Median Finder
Problem
设计结构支持:
addNum(num)
findMedian()
Required Structures
使用两个 heaps:
max heap for smaller half
min heap for larger half
Python 中, max heap 用 negative values 模拟。
Invariant
保持:
len(small) == len(large)
or
len(small) == len(large) + 1
并且:
all values in small <= all values in large
Code
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
def addNum(self, num):
heapq.heappush(self.small, -num)
if self.large and -self.small[0] > self.large[0]:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.small) > len(self.large) + 1:
value = -heapq.heappop(self.small)
heapq.heappush(self.large, value)
if len(self.large) > len(self.small):
value = heapq.heappop(self.large)
heapq.heappush(self.small, -value)
def findMedian(self):
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
👉 面试回答
Median Finder 使用 two heaps。
Max heap 存 smaller half, min heap 存 larger half。
我会 rebalance, 保证两个 heaps 的 size 最多差 1。
如果总数是 odd, median 是较大一边的 heap top。
如果是 even, median 是两个 heap tops 的平均值。
1️⃣5️⃣ Design Trie
Problem
支持:
insert(word)
search(word)
startsWith(prefix)
Required Structure
使用 Trie nodes。
每个 node 存:
children map
is_word flag
Code
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_word = True
def search(self, word):
node = self._find_node(word)
return node is not None and node.is_word
def startsWith(self, prefix):
return self._find_node(prefix) is not None
def _find_node(self, s):
node = self.root
for ch in s:
if ch not in node.children:
return None
node = node.children[ch]
return node
👉 面试回答
Trie 适合 prefix-based string operations。
每个 node 代表一个 prefix。
Insert 会沿着 characters 往下走, 并创建不存在的 nodes。
Search 只有在最后 node 标记为 complete word 时才返回 true。
startsWith 只需要 prefix path 存在。
1️⃣6️⃣ Word Dictionary
Problem
支持:
addWord(word)
search(word)
Search 可以包含 wildcard:
.
其中 . 可以匹配任意 character。
Required Structure
使用 Trie + DFS。
当 character 是 . 时:
try all children
Code
class WordDictionary:
def __init__(self):
self.root = TrieNode()
def addWord(self, word):
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_word = True
def search(self, word):
def dfs(index, node):
if index == len(word):
return node.is_word
ch = word[index]
if ch == ".":
for child in node.children.values():
if dfs(index + 1, child):
return True
return False
if ch not in node.children:
return False
return dfs(index + 1, node.children[ch])
return dfs(0, self.root)
👉 面试回答
WordDictionary 是 Trie 加 wildcard DFS。
普通 character 只沿着一个 child path 走。
当 character 是
.时, 我会尝试当前 node 的所有 children。只有在最后位置落在 complete word 上, 才返回 true。
1️⃣7️⃣ Hit Counter
Problem
统计过去 5 分钟的 hits。
支持:
hit(timestamp)
getHits(timestamp)
假设 5 分钟是:
300 seconds
Queue Approach
用 queue 存 timestamps。
移除早于:
timestamp - 300
的 timestamps。
Code
from collections import deque
class HitCounter:
def __init__(self):
self.queue = deque()
def hit(self, timestamp):
self.queue.append(timestamp)
def getHits(self, timestamp):
while self.queue and self.queue[0] <= timestamp - 300:
self.queue.popleft()
return len(self.queue)
👉 面试回答
Hit Counter 可以用 queue 实现。
每个 hit timestamp append 到 queue。
查询时, 移除不在最近 300 秒窗口内的 timestamps。
剩下的 queue size 就是有效 hits 数量。
1️⃣8️⃣ Moving Average From Data Stream
Problem
维护最近 size 个 values 的 average。
Required Structures
使用:
queue
running sum
Code
from collections import deque
class MovingAverage:
def __init__(self, size):
self.size = size
self.queue = deque()
self.total = 0
def next(self, val):
self.queue.append(val)
self.total += val
if len(self.queue) > self.size:
self.total -= self.queue.popleft()
return self.total / len(self.queue)
👉 面试回答
Moving Average 使用 queue 和 running sum。
Queue 存当前 window。
当 window 超过目标 size 时, 移除最旧的 value, 并从 sum 中减掉。
这样每次 update 是 O(1)。
1️⃣9️⃣ Snapshot Array
Problem
支持:
set(index, val)
snap()
get(index, snap_id)
Required Structure
对每个 index, 保存 history:
list of (snap_id, value)
Get 时使用 binary search。
Code
import bisect
class SnapshotArray:
def __init__(self, length):
self.snap_id = 0
self.history = [[(0, 0)] for _ in range(length)]
def set(self, index, val):
arr = self.history[index]
if arr[-1][0] == self.snap_id:
arr[-1] = (self.snap_id, val)
else:
arr.append((self.snap_id, val))
def snap(self):
self.snap_id += 1
return self.snap_id - 1
def get(self, index, snap_id):
arr = self.history[index]
i = bisect.bisect_right(arr, (snap_id, float("inf")))
return arr[i - 1][1]
👉 面试回答
SnapshotArray 对每个 index 保存 version history。
我不会每次 snapshot 都复制整个 array, 而是只记录 changed values。
get 时, binary search 找到 snap_id 小于等于 requested snap_id 的最新 value。
这样 snapshot 很便宜, lookup 是 logarithmic。
2️⃣0️⃣ LFU Cache
Problem
设计 cache 支持:
get(key)
put(key, value)
当超过 capacity 时, evict least frequently used key。
如果 frequency 相同, evict 其中 least recently used 的 key。
Required Structures
使用:
key -> node
frequency -> ordered dictionary of keys
min_frequency
Python 中,
OrderedDict 可以维护同 frequency 内的 recency。
Code
from collections import defaultdict, OrderedDict
class LFUCache:
def __init__(self, capacity):
self.capacity = capacity
self.key_to_val_freq = {}
self.freq_to_keys = defaultdict(OrderedDict)
self.min_freq = 0
def _update_freq(self, key):
value, freq = self.key_to_val_freq[key]
del self.freq_to_keys[freq][key]
if not self.freq_to_keys[freq]:
del self.freq_to_keys[freq]
if self.min_freq == freq:
self.min_freq += 1
new_freq = freq + 1
self.freq_to_keys[new_freq][key] = None
self.key_to_val_freq[key] = (value, new_freq)
def get(self, key):
if key not in self.key_to_val_freq:
return -1
self._update_freq(key)
return self.key_to_val_freq[key][0]
def put(self, key, value):
if self.capacity == 0:
return
if key in self.key_to_val_freq:
self.key_to_val_freq[key] = (value, self.key_to_val_freq[key][1])
self._update_freq(key)
return
if len(self.key_to_val_freq) >= self.capacity:
evict_key, _ = self.freq_to_keys[self.min_freq].popitem(last=False)
del self.key_to_val_freq[evict_key]
self.key_to_val_freq[key] = (value, 1)
self.freq_to_keys[1][key] = None
self.min_freq = 1
👉 面试回答
LFU Cache 需要同时追踪 frequency 和 recency。
我用一个 map 保存 key 到 value 和 frequency。
再用一个 map 保存 frequency 到 ordered set of keys。
min_freq告诉我应该从哪个 frequency bucket evict。在这个 bucket 内, evict least recently used key。
2️⃣1️⃣ All O(1) Data Structure
Problem
支持:
inc(key)
dec(key)
getMaxKey()
getMinKey()
全部 O(1)。
High-Level Idea
使用按 count 排序的 buckets。
每个 bucket 存同样 count 的 keys。
需要:
key -> bucket
doubly linked list of count buckets
Conceptual Structure
head <-> count 1 <-> count 2 <-> count 5 <-> tail
每个 bucket 有:
count
set of keys
prev
next
👉 面试回答
All O(1) Data Structure 使用 count buckets 的 doubly linked list。
每个 bucket 保存相同 count 的所有 keys。
HashMap 把 key 映射到当前 bucket。
Increment 和 decrement 会把 key 移动到相邻 count bucket。
Head 侧可以得到 minimum key, tail 侧可以得到 maximum key。
2️⃣2️⃣ Design Patterns by Requirement
Requirement Mapping
| Requirement | Common Structure |
|---|---|
| O(1) lookup by key | HashMap |
| O(1) random element | Array |
| O(1) remove arbitrary element | HashMap + array swap |
| O(1) reorder by recency | HashMap + doubly linked list |
| Get min/max quickly | Heap |
| Maintain median | Two heaps |
| Prefix search | Trie |
| Timestamp query | HashMap + sorted list + binary search |
| Version history | List of changes + binary search |
| Sliding window count | Queue + running state |
👉 面试回答
我会把 requirements 映射到 structures。
O(1) lookup 通常意味着 HashMap。
Random selection 通常意味着 array。
Array 中 O(1) deletion 需要 index mapping 和 swap-with-last。
Recency ordering 通常需要 doubly linked list。
Median 需要 two heaps。
Prefix search 需要 Trie。
Timestamp query 通常需要 sorted history 和 binary search。
2️⃣3️⃣ Complexity Analysis
Always State Complexity Per Operation
对 design problems, 不要只说整体复杂度。
要按 method 说:
get: O(1)
put: O(1)
remove: O(1)
findMedian: O(1)
addNum: O(log n)
Also State Space
通常:
Space: O(n)
其中 n 是 stored items 数量。
对 Trie:
Space: O(total characters)
对 SnapshotArray:
Space: O(number of set calls)
👉 面试回答
对 data structure design, 我会按 operation 分析复杂度。
每个 public method 都应该有明确的 time complexity。
Space complexity 要根据 stored elements、 total characters 或 total versions 来分析。
2️⃣4️⃣ 常见 Edge Cases
Empty Structure
Examples:
pop from empty stack
get from empty cache
getRandom from empty set
通常题目会保证 valid calls, 但如果不保证就要处理。
Existing Key
对 cache 或 map:
put existing key should update value
Capacity Zero
Cache problems 中很重要。
if capacity == 0:
return
Duplicate Values
要明确:
set disallows duplicates
multiset allows duplicates
Timestamp Boundaries
对 TimeMap:
query before first timestamp
query exactly at timestamp
query after latest timestamp
Recency Updates
对 LRU:
get also updates recency
put existing key also updates recency
👉 面试回答
常见 edge cases 包括 empty structures、 更新 existing keys、 zero capacity、 duplicate values、 timestamp boundary queries, 以及 reads 是否更新 recency。
Design problems 中, 很多 edge cases 本质上都是在考每次操作后内部状态是否仍然一致。
2️⃣5️⃣ 常见错误
Mistake 1: 只用一个 Data Structure
很多 design problems 需要组合结构。
Example:
LRU needs both HashMap and linked list.
Mistake 2: 忘记同时更新多个 Structures
如果使用 HashMap + array, 每次 insert/delete 都要更新两个结构。
Mistake 3: 没处理 Existing Keys
对 cache:
put existing key should update value and recency/frequency.
Mistake 4: Eviction Rule 错误
LRU evicts least recently used。
LFU evicts least frequently used, tie 时 evict least recently used。
Mistake 5: 没维护 Invariants
Example:
MedianFinder heaps must stay balanced.
Mistake 6: 忽略 Amortized Complexity
Queue using stacks 是 amortized O(1), 不是每次 pop 都 worst-case O(1)。
👉 面试回答
常见错误包括: 试图只用一个 structure 解决所有问题、 忘记更新所有内部 structures、 mishandle existing keys、 使用错误 eviction rule、 破坏 invariants, 以及混淆 amortized complexity 和 worst-case complexity。
2️⃣6️⃣ 什么时候 Design Data Structures 适用?
Good Fit
当题目要求:
Design a class
Implement operations
Support get/put/add/remove
O(1) operations
O(log n) operations
Data stream operations
Cache behavior
Versioned array
Randomized structure
Prefix dictionary
Problem Signals
看到这些关键词想到 data structure design:
design
implement
class
constructor
method
support
data structure
cache
stream
random
timestamp
snapshot
prefix
👉 面试回答
当 prompt 要求实现一个包含多个 operations 的 class 时, 我会识别为 design data structure problem。
关键不是只让 operation correct, 还要满足要求的 time complexity。
我通常通过组合基础 data structures, 并维护清晰 invariants 来解决。
2️⃣7️⃣ 什么时候 Design Data Structures 不适用?
Not Good Fit
这个 pattern 不一定适合:
problem asks for one-time computation
problem is mainly graph traversal
problem is pure dynamic programming
problem asks for sorting or searching only
problem has no persistent state
Better Alternatives
| Problem Type | Better Pattern |
|---|---|
| One-time array scan | Two Pointers / Sliding Window |
| Shortest path | BFS / Dijkstra |
| Dependency ordering | Topological Sort |
| Optimization over states | Dynamic Programming |
| Range query with updates | Segment Tree / Fenwick Tree |
| String matching | KMP / Rolling Hash |
| Connectivity | Union Find |
👉 面试回答
Data structure design 最适合有 persistent state 和 multiple operations 的问题。
如果题目只是一次性计算, 那么 sliding window、DP、graph traversal、 sorting 之类的标准 algorithmic pattern 可能更合适。
2️⃣8️⃣ 标准模板
HashMap + Array Template
class Structure:
def __init__(self):
self.arr = []
self.index = {}
def add(self, val):
if val in self.index:
return False
self.index[val] = len(self.arr)
self.arr.append(val)
return True
def remove(self, val):
if val not in self.index:
return False
i = self.index[val]
last = self.arr[-1]
self.arr[i] = last
self.index[last] = i
self.arr.pop()
del self.index[val]
return True
Doubly Linked List Node Template
class Node:
def __init__(self, key=0, value=0):
self.key = key
self.value = value
self.prev = None
self.next = None
LRU Helper Template
def remove(node):
node.prev.next = node.next
node.next.prev = node.prev
def add_to_front(node):
first = head.next
node.prev = head
node.next = first
head.next = node
first.prev = node
Two Heaps Template
import heapq
small = [] # max heap using negative values
large = [] # min heap
def add(num):
heapq.heappush(small, -num)
if large and -small[0] > large[0]:
heapq.heappush(large, -heapq.heappop(small))
if len(small) > len(large) + 1:
heapq.heappush(large, -heapq.heappop(small))
if len(large) > len(small):
heapq.heappush(small, -heapq.heappop(large))
Timestamp Binary Search Template
import bisect
def get_value(history, timestamp):
i = bisect.bisect_right(history, (timestamp, chr(255)))
if i == 0:
return ""
return history[i - 1][1]
Trie Template
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
🧠 Staff-Level Answer Final
👉 面试回答完整版本
Design data structure problems 要求我们实现一个 class, 并让它高效支持一组 operations。
我的第一步是列出每个 operation, 并明确它的 required time complexity。 然后我把每个 requirement 映射到底层 data structure。
如果需要 O(1) key lookup, 通常需要 HashMap。 如果需要 O(1) random access, 通常需要 array。 如果需要在 array 中 O(1) 删除任意元素, 就需要 array 加 HashMap from value to index, 并使用 swap-with-last trick。
如果需要 recency ordering, 例如 LRU Cache, 我会组合 HashMap 和 doubly linked list。 HashMap 让 key 到 node 是 O(1), linked list 让 remove 和 move-to-front 是 O(1)。
如果需要维护 data stream median, 我会使用 two heaps: max heap 存 smaller half, min heap 存 larger half。 保持两个 heaps 平衡后, median 总能从 heap tops 得到。
如果需要 timestamp-based lookup, 我会用 HashMap, 把 key 映射到 sorted list of timestamp-value pairs, query 时使用 binary search。
如果需要 prefix search, 我会使用 Trie。 每个 Trie node 代表一个 prefix, Trie 中的一条 path 代表一个 word 或 prefix。
对 cache problems, eviction rule 非常关键。 LRU 删除 least recently used item。 LFU 删除 least frequently used item, 如果 frequency 相同, 删除该 frequency 中 least recently used item。
Data structure design 最重要的是维护 invariants。 例如 LRU 中, linked list order 必须永远等于 recency order。 MedianFinder 中, two heaps 必须保持平衡。 RandomizedSet 中, array 和 index map 必须永远一致。
高级工程师解释 design data structures 时, 我会讲清楚: required operations 是什么, 每个 operation 的 complexity 是什么, 内部使用哪些 data structures, 每个 structure 维护什么 invariant, 每个 method 如何更新所有内部状态, 以及哪些 edge cases 可能破坏设计。
⭐ Final Insight
Design Data Structures 的核心不是“会用 HashMap”。
它的核心是把 operation requirements 映射到 data structure combination。
Senior-level 的重点是:
每个 method 的复杂度是多少, 为什么单个 structure 不够, HashMap 负责什么, array / heap / linked list / Trie 负责什么, 每次操作后 invariant 是否仍然成立, existing key、capacity zero、duplicate values、timestamp boundary 怎么处理, 以及 amortized O(1) 和 worst-case O(1) 的区别。
能讲清楚这些, Design Data Structures 就是一类处理 custom class、multi-operation state、cache、stream、random access 和 prefix lookup 的核心模式。
Implement