LeetCode Patterns - 30 Design Data Structures

Post by ailswan June. 02, 2026

中文 ↓

🎯 Design Data Structures

1️⃣ Core Framework

When discussing Design Data Structures, I frame it as:

  1. What data structure design problems are
  2. How to identify required operations
  3. Time complexity targets
  4. HashMap-based design
  5. Stack and queue design
  6. Heap-based design
  7. Linked list + HashMap design
  8. Trie-based design
  9. Randomized data structures
  10. Senior-level explanation

2️⃣ What Problem Are We Solving?

Design data structure problems ask us to implement a custom class that supports a set of operations efficiently.

They are commonly used for:

The core idea is:

Data structure design is about matching operations
to the right combination of underlying structures.

👉 Interview Answer

Design data structure problems are about implementing a class with specific operations and complexity requirements.

I first list the required operations, then identify the time complexity target for each operation.

Most solutions combine multiple basic data structures, such as HashMap, linked list, heap, stack, queue, Trie, or balanced tree.

The key is to make each operation efficient while keeping internal state consistent.


3️⃣ How to Approach Design Problems


Step 1: List Operations

Example:

put(key, value)
get(key)
delete(key)

Step 2: Identify Complexity Requirements

The problem may ask for:

O(1)
O(log n)
O(k)

where k might be word length or number of returned items.


Step 3: Choose Data Structures

Ask:

Need fast lookup?        -> HashMap
Need ordering?           -> linked list / heap / tree
Need min or max?         -> heap / monotonic structure
Need prefix search?      -> Trie
Need random access?      -> array
Need fast deletion?      -> HashMap + index / linked list

Step 4: Maintain Invariants

Every design has internal rules.

Example:

LRU cache:
head = most recent
tail = least recent
map key -> linked list node

👉 Interview Answer

My design process is: first list the operations, then write the target complexity for each operation, then choose data structures that support those operations.

After that, I define invariants that must always remain true after every method call.


4️⃣ Common Building Blocks


HashMap

Use for:

O(1) key lookup
key to value
key to node
value to index
frequency to nodes

Array / List

Use for:

index access
random selection
compact storage
append and pop

Linked List

Use for:

O(1) remove node
O(1) move node
maintain order
LRU order

Heap

Use for:

min value
max value
top k
median
priority ordering

Trie

Use for:

prefix search
dictionary
word matching
autocomplete

👉 Interview Answer

Most design problems are solved by combining basic structures.

HashMap gives fast lookup. Array gives index access and random selection. Linked list gives O(1) reordering. Heap gives efficient min or max retrieval. Trie gives prefix search.

The trick is choosing the right combination.


5️⃣ Design HashSet


Problem

Implement:

add(key)
remove(key)
contains(key)

Simple Approach

Use buckets.

Each bucket stores values that hash to the same index.


Code

class MyHashSet:
    def __init__(self):
        self.size = 1000
        self.buckets = [[] for _ in range(self.size)]

    def _hash(self, key):
        return key % self.size

    def add(self, key):
        index = self._hash(key)

        if key not in self.buckets[index]:
            self.buckets[index].append(key)

    def remove(self, key):
        index = self._hash(key)

        if key in self.buckets[index]:
            self.buckets[index].remove(key)

    def contains(self, key):
        index = self._hash(key)

        return key in self.buckets[index]

👉 Interview Answer

A HashSet can be implemented with buckets.

I hash the key to find a bucket.

The bucket handles collisions by storing multiple values.

Add, remove, and contains are efficient on average, but collision handling matters.


6️⃣ Design HashMap


Problem

Implement:

put(key, value)
get(key)
remove(key)

Code

class MyHashMap:
    def __init__(self):
        self.size = 1000
        self.buckets = [[] for _ in range(self.size)]

    def _hash(self, key):
        return key % self.size

    def put(self, key, value):
        index = self._hash(key)
        bucket = self.buckets[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        bucket.append((key, value))

    def get(self, key):
        index = self._hash(key)

        for k, v in self.buckets[index]:
            if k == key:
                return v

        return -1

    def remove(self, key):
        index = self._hash(key)
        bucket = self.buckets[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                return

👉 Interview Answer

A HashMap stores key-value pairs.

I use a hash function to locate a bucket, then search within that bucket for the key.

Collisions are handled by chaining, where each bucket stores multiple key-value pairs.


7️⃣ Min Stack


Problem

Design a stack that supports:

push
pop
top
getMin

All in O(1).


Key Idea

Use two stacks:

main stack
min stack

The min stack keeps track of the minimum value at each level.


Code

class MinStack:
    def __init__(self):
        self.stack = []
        self.min_stack = []

    def push(self, val):
        self.stack.append(val)

        if not self.min_stack:
            self.min_stack.append(val)
        else:
            self.min_stack.append(min(val, self.min_stack[-1]))

    def pop(self):
        self.stack.pop()
        self.min_stack.pop()

    def top(self):
        return self.stack[-1]

    def getMin(self):
        return self.min_stack[-1]

👉 Interview Answer

Min Stack uses an auxiliary stack to track the minimum value at each stack depth.

Every time I push a value, I also push the current minimum.

When I pop, both stacks pop together.

This makes getMin O(1).


8️⃣ Queue Using Stacks


Problem

Implement queue operations using stacks:

push
pop
peek
empty

Key Idea

Use two stacks:

input stack
output stack

Push to input.

Pop from output.

If output is empty, move all input elements to output.


Code

class MyQueue:
    def __init__(self):
        self.input = []
        self.output = []

    def push(self, x):
        self.input.append(x)

    def _move(self):
        if not self.output:
            while self.input:
                self.output.append(self.input.pop())

    def pop(self):
        self._move()
        return self.output.pop()

    def peek(self):
        self._move()
        return self.output[-1]

    def empty(self):
        return not self.input and not self.output

👉 Interview Answer

Queue using stacks uses two stacks.

New elements go into the input stack.

When we need to pop or peek, we move elements to the output stack only if output is empty.

This reverses order and gives FIFO behavior.

Each element moves at most twice, so operations are amortized O(1).


9️⃣ Stack Using Queues


Problem

Implement stack operations using queues:

push
pop
top
empty

One Queue Approach

When pushing a new element, rotate the queue so the newest element becomes front.


Code

from collections import deque

class MyStack:
    def __init__(self):
        self.queue = deque()

    def push(self, x):
        self.queue.append(x)

        for _ in range(len(self.queue) - 1):
            self.queue.append(self.queue.popleft())

    def pop(self):
        return self.queue.popleft()

    def top(self):
        return self.queue[0]

    def empty(self):
        return not self.queue

👉 Interview Answer

Stack using one queue works by rotating the queue after each push.

After inserting a new value, I move all previous elements behind it.

This makes the newest element appear at the front, so pop and top behave like a stack.


🔟 LRU Cache


Problem

Design a cache with:

get(key)
put(key, value)

Both in O(1).

When capacity is exceeded, remove the least recently used key.


Required Structures

Use:

HashMap: key -> node
Doubly linked list: usage order

Linked list order:

head = most recently used
tail = least recently used

👉 Interview Answer

LRU Cache requires O(1) lookup and O(1) update of usage order.

A HashMap gives O(1) access from key to node.

A doubly linked list gives O(1) removal and insertion.

On every get or put, I move the accessed node to the front.

When capacity is exceeded, I remove the node at the tail.


1️⃣1️⃣ LRU Cache Code


Code

class Node:
    def __init__(self, key=0, value=0):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None


class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}

        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        prev_node = node.prev
        next_node = node.next

        prev_node.next = next_node
        next_node.prev = prev_node

    def _add_to_front(self, node):
        first = self.head.next

        node.prev = self.head
        node.next = first

        self.head.next = node
        first.prev = node

    def get(self, key):
        if key not in self.cache:
            return -1

        node = self.cache[key]
        self._remove(node)
        self._add_to_front(node)

        return node.value

    def put(self, key, value):
        if key in self.cache:
            node = self.cache[key]
            node.value = value

            self._remove(node)
            self._add_to_front(node)
            return

        node = Node(key, value)
        self.cache[key] = node
        self._add_to_front(node)

        if len(self.cache) > self.capacity:
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]

👉 Interview Answer

In LRU Cache, the HashMap maps keys to linked-list nodes.

The linked list stores recency order.

Whenever a key is accessed, I remove its node from its current position and add it to the front.

If the cache exceeds capacity, I remove the node before the tail, which is the least recently used node.


1️⃣2️⃣ Insert Delete GetRandom O(1)


Problem

Support:

insert(val)
remove(val)
getRandom()

All average O(1).


Required Structures

Use:

array for random access
HashMap value -> index

Remove Trick

To remove a value in O(1):

swap it with the last element
pop last element
update moved element's index

Code

import random

class RandomizedSet:
    def __init__(self):
        self.values = []
        self.index = {}

    def insert(self, val):
        if val in self.index:
            return False

        self.index[val] = len(self.values)
        self.values.append(val)

        return True

    def remove(self, val):
        if val not in self.index:
            return False

        remove_index = self.index[val]
        last_value = self.values[-1]

        self.values[remove_index] = last_value
        self.index[last_value] = remove_index

        self.values.pop()
        del self.index[val]

        return True

    def getRandom(self):
        return random.choice(self.values)

👉 Interview Answer

RandomizedSet combines an array and a HashMap.

The array allows O(1) random access.

The HashMap maps each value to its index.

To delete in O(1), I swap the element with the last element, pop the last element, and update the moved element’s index.


1️⃣3️⃣ Time-Based Key-Value Store


Problem

Support:

set(key, value, timestamp)
get(key, timestamp)

Return the value with the largest timestamp less than or equal to the given timestamp.


Required Structures

Use:

HashMap key -> list of (timestamp, value)

Because timestamps are increasing, each list is sorted.

Use binary search during get.


Code

from collections import defaultdict
import bisect

class TimeMap:
    def __init__(self):
        self.store = defaultdict(list)

    def set(self, key, value, timestamp):
        self.store[key].append((timestamp, value))

    def get(self, key, timestamp):
        arr = self.store[key]

        index = bisect.bisect_right(arr, (timestamp, chr(255)))

        if index == 0:
            return ""

        return arr[index - 1][1]

👉 Interview Answer

TimeMap uses a HashMap from key to a sorted list of timestamp-value pairs.

Since timestamps are inserted in increasing order, set is O(1).

For get, I binary search for the largest timestamp less than or equal to the query timestamp.

That gives O(log n) lookup per key.


1️⃣4️⃣ Median Finder


Problem

Design a data structure that supports:

addNum(num)
findMedian()

Required Structures

Use two heaps:

max heap for smaller half
min heap for larger half

In Python, max heap is simulated using negative values.


Invariant

Maintain:

len(small) == len(large)
or
len(small) == len(large) + 1

And:

all values in small <= all values in large

Code

import heapq

class MedianFinder:
    def __init__(self):
        self.small = []
        self.large = []

    def addNum(self, num):
        heapq.heappush(self.small, -num)

        if self.large and -self.small[0] > self.large[0]:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.small) > len(self.large) + 1:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.large) > len(self.small):
            value = heapq.heappop(self.large)
            heapq.heappush(self.small, -value)

    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]

        return (-self.small[0] + self.large[0]) / 2

👉 Interview Answer

Median Finder uses two heaps.

The max heap stores the smaller half, and the min heap stores the larger half.

I rebalance so their sizes differ by at most one.

If the total count is odd, the median is the top of the larger side.

If even, it is the average of both heap tops.


1️⃣5️⃣ Design Trie


Problem

Support:

insert(word)
search(word)
startsWith(prefix)

Required Structure

Use Trie nodes.

Each node stores:

children map
is_word flag

Code

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root

        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()

            node = node.children[ch]

        node.is_word = True

    def search(self, word):
        node = self._find_node(word)

        return node is not None and node.is_word

    def startsWith(self, prefix):
        return self._find_node(prefix) is not None

    def _find_node(self, s):
        node = self.root

        for ch in s:
            if ch not in node.children:
                return None

            node = node.children[ch]

        return node

👉 Interview Answer

Trie is designed for prefix-based string operations.

Each node represents a prefix.

Insert walks through characters and creates missing nodes.

Search returns true only if the final node marks a complete word.

startsWith only needs the prefix path to exist.


1️⃣6️⃣ Word Dictionary


Problem

Support:

addWord(word)
search(word)

Search may contain wildcard:

.

where . matches any character.


Required Structure

Use Trie + DFS.

When character is .:

try all children

Code

class WordDictionary:
    def __init__(self):
        self.root = TrieNode()

    def addWord(self, word):
        node = self.root

        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()

            node = node.children[ch]

        node.is_word = True

    def search(self, word):
        def dfs(index, node):
            if index == len(word):
                return node.is_word

            ch = word[index]

            if ch == ".":
                for child in node.children.values():
                    if dfs(index + 1, child):
                        return True

                return False

            if ch not in node.children:
                return False

            return dfs(index + 1, node.children[ch])

        return dfs(0, self.root)

👉 Interview Answer

WordDictionary extends Trie with wildcard search.

Normal characters follow one child path.

When the character is ., I try all possible children using DFS.

The final result is true only if a full word ends at the final position.


1️⃣7️⃣ Hit Counter


Problem

Count hits in the past 5 minutes.

Support:

hit(timestamp)
getHits(timestamp)

Assume 5 minutes means:

300 seconds

Queue Approach

Store timestamps in a queue.

Remove timestamps older than:

timestamp - 300

Code

from collections import deque

class HitCounter:
    def __init__(self):
        self.queue = deque()

    def hit(self, timestamp):
        self.queue.append(timestamp)

    def getHits(self, timestamp):
        while self.queue and self.queue[0] <= timestamp - 300:
            self.queue.popleft()

        return len(self.queue)

👉 Interview Answer

Hit Counter can be implemented with a queue.

Each hit timestamp is appended.

When querying, I remove timestamps outside the last 300 seconds.

The queue then contains only valid hits.


1️⃣8️⃣ Moving Average From Data Stream


Problem

Maintain average of the last size values.


Required Structures

Use:

queue
running sum

Code

from collections import deque

class MovingAverage:
    def __init__(self, size):
        self.size = size
        self.queue = deque()
        self.total = 0

    def next(self, val):
        self.queue.append(val)
        self.total += val

        if len(self.queue) > self.size:
            self.total -= self.queue.popleft()

        return self.total / len(self.queue)

👉 Interview Answer

Moving Average uses a queue and a running sum.

The queue stores the current window.

When the window exceeds the target size, I remove the oldest value and subtract it from the sum.

This gives O(1) update time.


1️⃣9️⃣ Snapshot Array


Problem

Support:

set(index, val)
snap()
get(index, snap_id)

Required Structure

For each index, store history:

list of (snap_id, value)

Use binary search for get.


Code

import bisect

class SnapshotArray:
    def __init__(self, length):
        self.snap_id = 0
        self.history = [[(0, 0)] for _ in range(length)]

    def set(self, index, val):
        arr = self.history[index]

        if arr[-1][0] == self.snap_id:
            arr[-1] = (self.snap_id, val)
        else:
            arr.append((self.snap_id, val))

    def snap(self):
        self.snap_id += 1
        return self.snap_id - 1

    def get(self, index, snap_id):
        arr = self.history[index]

        i = bisect.bisect_right(arr, (snap_id, float("inf")))

        return arr[i - 1][1]

👉 Interview Answer

SnapshotArray stores version history per index.

Instead of copying the whole array on every snapshot, I only store changes.

For get, I binary search the latest value whose snap id is less than or equal to the requested snap id.

This makes snapshots cheap and lookup logarithmic.


2️⃣0️⃣ LFU Cache


Problem

Design cache with:

get(key)
put(key, value)

When capacity is exceeded, evict the least frequently used key.

If there is a tie, evict the least recently used among them.


Required Structures

Use:

key -> node
frequency -> ordered dictionary of keys
min_frequency

In Python, OrderedDict helps maintain recency within each frequency.


Code

from collections import defaultdict, OrderedDict

class LFUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.key_to_val_freq = {}
        self.freq_to_keys = defaultdict(OrderedDict)
        self.min_freq = 0

    def _update_freq(self, key):
        value, freq = self.key_to_val_freq[key]

        del self.freq_to_keys[freq][key]

        if not self.freq_to_keys[freq]:
            del self.freq_to_keys[freq]

            if self.min_freq == freq:
                self.min_freq += 1

        new_freq = freq + 1
        self.freq_to_keys[new_freq][key] = None
        self.key_to_val_freq[key] = (value, new_freq)

    def get(self, key):
        if key not in self.key_to_val_freq:
            return -1

        self._update_freq(key)
        return self.key_to_val_freq[key][0]

    def put(self, key, value):
        if self.capacity == 0:
            return

        if key in self.key_to_val_freq:
            self.key_to_val_freq[key] = (value, self.key_to_val_freq[key][1])
            self._update_freq(key)
            return

        if len(self.key_to_val_freq) >= self.capacity:
            evict_key, _ = self.freq_to_keys[self.min_freq].popitem(last=False)
            del self.key_to_val_freq[evict_key]

        self.key_to_val_freq[key] = (value, 1)
        self.freq_to_keys[1][key] = None
        self.min_freq = 1

👉 Interview Answer

LFU Cache needs to track both frequency and recency.

I keep a map from key to value and frequency.

I also keep a map from frequency to an ordered set of keys.

min_freq tells me which frequency bucket to evict from.

Within that bucket, I evict the least recently used key.


2️⃣1️⃣ All O(1) Data Structure


Problem

Support:

inc(key)
dec(key)
getMaxKey()
getMinKey()

All in O(1).


High-Level Idea

Use buckets ordered by count.

Each bucket stores keys with the same count.

Need:

key -> bucket
doubly linked list of count buckets

Conceptual Structure

head <-> count 1 <-> count 2 <-> count 5 <-> tail

Each bucket has:

count
set of keys
prev
next

👉 Interview Answer

The All O(1) Data Structure is built from a doubly linked list of count buckets.

Each bucket stores all keys with the same count.

A HashMap maps each key to its current bucket.

Increment and decrement move a key to an adjacent count bucket.

The head side gives the minimum key, and the tail side gives the maximum key.


2️⃣2️⃣ Design Patterns by Requirement


Requirement Mapping

Requirement Common Structure
O(1) lookup by key HashMap
O(1) random element Array
O(1) remove arbitrary element HashMap + array swap
O(1) reorder by recency HashMap + doubly linked list
Get min/max quickly Heap
Maintain median Two heaps
Prefix search Trie
Timestamp query HashMap + sorted list + binary search
Version history List of changes + binary search
Sliding window count Queue + running state

👉 Interview Answer

I map requirements to structures.

O(1) lookup usually means HashMap.

Random selection usually means array.

O(1) deletion from array requires index mapping and swap-with-last.

Recency ordering usually needs a doubly linked list.

Median requires two heaps.

Prefix search requires Trie.

Timestamp queries usually require sorted history and binary search.


2️⃣3️⃣ Complexity Analysis


Always State Complexity Per Operation

For design problems, do not only state total complexity.

State per method:

get: O(1)
put: O(1)
remove: O(1)
findMedian: O(1)
addNum: O(log n)

Also State Space

Usually:

Space: O(n)

where n is number of stored items.

For Trie:

Space: O(total characters)

For SnapshotArray:

Space: O(number of set calls)

👉 Interview Answer

For data structure design, I analyze complexity per operation.

Each public method should have a clear time complexity.

I also describe space complexity based on the number of stored elements, total characters, or total versions depending on the design.


2️⃣4️⃣ Common Edge Cases


Empty Structure

Examples:

pop from empty stack
get from empty cache
getRandom from empty set

Usually problem guarantees valid calls, but check if not guaranteed.


Existing Key

For cache or map:

put existing key should update value

Capacity Zero

Important for cache problems.

if capacity == 0:
    return

Duplicate Values

Need decide:

set disallows duplicates
multiset allows duplicates

Timestamp Boundaries

For TimeMap:

query before first timestamp
query exactly at timestamp
query after latest timestamp

Recency Updates

For LRU:

get also updates recency
put existing key also updates recency

👉 Interview Answer

Common edge cases include empty structures, updating existing keys, zero capacity, duplicate values, timestamp boundary queries, and whether reads update recency.

In design problems, edge cases are often about keeping internal state consistent after every operation.


2️⃣5️⃣ Common Mistakes


Mistake 1: Using Only One Data Structure

Many design problems require a combination.

Example:

LRU needs both HashMap and linked list.

Mistake 2: Forgetting to Update Both Structures

If using HashMap + array, every insert/delete must update both.


Mistake 3: Not Handling Existing Keys

For cache:

put existing key should update value and recency/frequency.

Mistake 4: Wrong Eviction Rule

LRU evicts least recently used.

LFU evicts least frequently used, then least recently used among ties.


Mistake 5: Not Maintaining Invariants

Example:

MedianFinder heaps must stay balanced.

Mistake 6: Ignoring Amortized Complexity

Queue using stacks has amortized O(1), not worst-case O(1) per pop.


👉 Interview Answer

Common mistakes include trying to solve the problem with only one structure, forgetting to update all internal structures, mishandling existing keys, applying the wrong eviction rule, breaking invariants, and confusing amortized complexity with worst-case complexity.


2️⃣6️⃣ When Design Data Structures Applies


Good Fit

This pattern applies when the problem asks:

Design a class
Implement operations
Support get/put/add/remove
O(1) operations
O(log n) operations
Data stream operations
Cache behavior
Versioned array
Randomized structure
Prefix dictionary

Problem Signals

Look for words like:

design
implement
class
constructor
method
support
data structure
cache
stream
random
timestamp
snapshot
prefix

👉 Interview Answer

I recognize design data structure problems when the prompt asks me to implement a class with several operations.

The key is not just making each operation correct, but meeting the required time complexity.

I usually solve these by combining basic data structures and carefully maintaining invariants.


2️⃣7️⃣ When Design Data Structures Does Not Apply


Not Good Fit

This pattern may not be enough when:

problem asks for one-time computation
problem is mainly graph traversal
problem is pure dynamic programming
problem asks for sorting or searching only
problem has no persistent state

Better Alternatives

Problem Type Better Pattern
One-time array scan Two Pointers / Sliding Window
Shortest path BFS / Dijkstra
Dependency ordering Topological Sort
Optimization over states Dynamic Programming
Range query with updates Segment Tree / Fenwick Tree
String matching KMP / Rolling Hash
Connectivity Union Find

👉 Interview Answer

Data structure design is most useful when the problem has persistent state and multiple operations.

If the problem is a one-time computation, then a standard algorithmic pattern like sliding window, DP, graph traversal, or sorting may be more appropriate.


2️⃣8️⃣ Standard Templates


HashMap + Array Template

class Structure:
    def __init__(self):
        self.arr = []
        self.index = {}

    def add(self, val):
        if val in self.index:
            return False

        self.index[val] = len(self.arr)
        self.arr.append(val)
        return True

    def remove(self, val):
        if val not in self.index:
            return False

        i = self.index[val]
        last = self.arr[-1]

        self.arr[i] = last
        self.index[last] = i

        self.arr.pop()
        del self.index[val]

        return True

Doubly Linked List Node Template

class Node:
    def __init__(self, key=0, value=0):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

LRU Helper Template

def remove(node):
    node.prev.next = node.next
    node.next.prev = node.prev

def add_to_front(node):
    first = head.next

    node.prev = head
    node.next = first

    head.next = node
    first.prev = node

Two Heaps Template

import heapq

small = []  # max heap using negative values
large = []  # min heap

def add(num):
    heapq.heappush(small, -num)

    if large and -small[0] > large[0]:
        heapq.heappush(large, -heapq.heappop(small))

    if len(small) > len(large) + 1:
        heapq.heappush(large, -heapq.heappop(small))

    if len(large) > len(small):
        heapq.heappush(small, -heapq.heappop(large))

Timestamp Binary Search Template

import bisect

def get_value(history, timestamp):
    i = bisect.bisect_right(history, (timestamp, chr(255)))

    if i == 0:
        return ""

    return history[i - 1][1]

Trie Template

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

🧠 Staff-Level Answer Final


👉 Interview Answer Full Version

Design data structure problems ask us to implement a class that supports a set of operations efficiently.

I approach these problems by first listing every operation and its required time complexity. Then I map each requirement to an underlying data structure.

If I need O(1) lookup by key, I usually need a HashMap. If I need O(1) random access, I usually need an array. If I need O(1) deletion from an array, I combine the array with a HashMap from value to index and use the swap-with-last trick.

If I need recency ordering, such as in LRU Cache, I combine a HashMap with a doubly linked list. The HashMap gives O(1) access to nodes, and the linked list allows O(1) remove and move-to-front.

If I need median in a stream, I use two heaps: a max heap for the smaller half and a min heap for the larger half. I keep the heaps balanced so the median is always available from the heap tops.

If I need timestamp-based lookup, I use a HashMap from key to a sorted list of timestamp-value pairs, then use binary search during query.

If I need prefix search, I use a Trie. Each Trie node represents a prefix, and the path through the Trie represents a word or prefix.

For cache problems, the eviction rule is critical. LRU evicts the least recently used item. LFU evicts the least frequently used item, and if there is a tie, it evicts the least recently used item among that frequency.

The most important part of data structure design is maintaining invariants. For example, in LRU, the linked list order must always match recency. In MedianFinder, the two heaps must remain balanced. In RandomizedSet, the array and index map must always agree.

At a senior level, I would explain: what operations are required, what complexity each operation needs, what internal data structures are used, what invariant each structure maintains, how each method updates all internal state, and which edge cases can break the design.


⭐ Final Insight

Design Data Structures 的核心不是“会用 HashMap”。

它的核心是把 operation requirements 映射到 data structure combination。

Senior-level 的重点是:

每个 method 的复杂度是多少, 为什么单个 structure 不够, HashMap 负责什么, array / heap / linked list / Trie 负责什么, 每次操作后 invariant 是否仍然成立, existing key、capacity zero、duplicate values、timestamp boundary 怎么处理, 以及 amortized O(1) 和 worst-case O(1) 的区别。

能讲清楚这些, Design Data Structures 就是一类处理 custom class、multi-operation state、cache、stream、random access 和 prefix lookup 的核心模式。



中文部分


🎯 Design Data Structures


1️⃣ 核心框架

讨论 Design Data Structures 时,我通常从:

  1. Data structure design problems 是什么
  2. 如何识别 required operations
  3. Time complexity targets
  4. HashMap-based design
  5. Stack and queue design
  6. Heap-based design
  7. Linked list + HashMap design
  8. Trie-based design
  9. Randomized data structures
  10. 高级工程师解释方式

2️⃣ 要解决什么问题?

Design data structure problems 要求我们实现一个 custom class, 并且高效支持一组 operations。

它常用于:

核心思想是:

Data structure design 是把 operations
匹配到合适的底层 data structures 组合。

👉 面试回答

Design data structure problems 是实现一个 class, 让它支持特定 operations 和复杂度要求。

我会先列出所有 required operations, 然后明确每个 operation 的 time complexity target。

大多数解法会组合多个基础结构, 例如 HashMap、linked list、heap、stack、queue、Trie 或 balanced tree。

关键是让每个 operation 高效, 同时保证内部状态一致。


3️⃣ How to Approach Design Problems


Step 1: List Operations

Example:

put(key, value)
get(key)
delete(key)

Step 2: Identify Complexity Requirements

题目可能要求:

O(1)
O(log n)
O(k)

其中 k 可能是 word length 或返回元素数量。


Step 3: Choose Data Structures

问自己:

Need fast lookup?        -> HashMap
Need ordering?           -> linked list / heap / tree
Need min or max?         -> heap / monotonic structure
Need prefix search?      -> Trie
Need random access?      -> array
Need fast deletion?      -> HashMap + index / linked list

Step 4: Maintain Invariants

每个设计都有内部规则。

Example:

LRU cache:
head = most recent
tail = least recent
map key -> linked list node

👉 面试回答

我的设计流程是: 先列出 operations, 然后写出每个 operation 的目标复杂度, 再选择能支持这些 operations 的 data structures。

最后我会定义 invariants, 并确保每次 method call 后这些规则仍然成立。


4️⃣ Common Building Blocks


HashMap

用于:

O(1) key lookup
key to value
key to node
value to index
frequency to nodes

Array / List

用于:

index access
random selection
compact storage
append and pop

Linked List

用于:

O(1) remove node
O(1) move node
maintain order
LRU order

Heap

用于:

min value
max value
top k
median
priority ordering

Trie

用于:

prefix search
dictionary
word matching
autocomplete

👉 面试回答

大多数 design problems 都是基础结构的组合。

HashMap 提供快速 lookup。 Array 提供 index access 和 random selection。 Linked list 提供 O(1) reordering。 Heap 提供高效 min 或 max retrieval。 Trie 提供 prefix search。

难点在于选择正确组合。


5️⃣ Design HashSet


Problem

实现:

add(key)
remove(key)
contains(key)

Simple Approach

使用 buckets。

每个 bucket 存储 hash 到同一个 index 的 values。


Code

class MyHashSet:
    def __init__(self):
        self.size = 1000
        self.buckets = [[] for _ in range(self.size)]

    def _hash(self, key):
        return key % self.size

    def add(self, key):
        index = self._hash(key)

        if key not in self.buckets[index]:
            self.buckets[index].append(key)

    def remove(self, key):
        index = self._hash(key)

        if key in self.buckets[index]:
            self.buckets[index].remove(key)

    def contains(self, key):
        index = self._hash(key)

        return key in self.buckets[index]

👉 面试回答

HashSet 可以用 buckets 实现。

我用 hash function 找到 key 对应的 bucket。

Bucket 通过存多个 values 处理 collisions。

Add、remove 和 contains 平均情况下比较高效, 但 collision handling 很重要。


6️⃣ Design HashMap


Problem

实现:

put(key, value)
get(key)
remove(key)

Code

class MyHashMap:
    def __init__(self):
        self.size = 1000
        self.buckets = [[] for _ in range(self.size)]

    def _hash(self, key):
        return key % self.size

    def put(self, key, value):
        index = self._hash(key)
        bucket = self.buckets[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        bucket.append((key, value))

    def get(self, key):
        index = self._hash(key)

        for k, v in self.buckets[index]:
            if k == key:
                return v

        return -1

    def remove(self, key):
        index = self._hash(key)
        bucket = self.buckets[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                return

👉 面试回答

HashMap 存储 key-value pairs。

我用 hash function 找到 bucket, 然后在 bucket 中查找 key。

Collision 通过 chaining 处理, 也就是每个 bucket 可以存多个 key-value pairs。


7️⃣ Min Stack


Problem

设计一个 stack 支持:

push
pop
top
getMin

全部 O(1)。


Key Idea

使用两个 stacks:

main stack
min stack

Min stack 记录每一层的当前 minimum。


Code

class MinStack:
    def __init__(self):
        self.stack = []
        self.min_stack = []

    def push(self, val):
        self.stack.append(val)

        if not self.min_stack:
            self.min_stack.append(val)
        else:
            self.min_stack.append(min(val, self.min_stack[-1]))

    def pop(self):
        self.stack.pop()
        self.min_stack.pop()

    def top(self):
        return self.stack[-1]

    def getMin(self):
        return self.min_stack[-1]

👉 面试回答

Min Stack 使用 auxiliary stack 记录每个 stack depth 的 minimum。

每 push 一个 value, 同时 push 当前 minimum。

Pop 时两个 stacks 一起 pop。

这样 getMin 就是 O(1)。


8️⃣ Queue Using Stacks


Problem

使用 stacks 实现 queue operations:

push
pop
peek
empty

Key Idea

使用两个 stacks:

input stack
output stack

Push 到 input。

Pop 从 output。

如果 output 为空, 把 input 全部倒入 output。


Code

class MyQueue:
    def __init__(self):
        self.input = []
        self.output = []

    def push(self, x):
        self.input.append(x)

    def _move(self):
        if not self.output:
            while self.input:
                self.output.append(self.input.pop())

    def pop(self):
        self._move()
        return self.output.pop()

    def peek(self):
        self._move()
        return self.output[-1]

    def empty(self):
        return not self.input and not self.output

👉 面试回答

Queue using stacks 使用两个 stacks。

新元素进入 input stack。

当需要 pop 或 peek 时, 如果 output stack 为空, 就把 input 中的元素全部移动到 output。

这样会反转顺序, 实现 FIFO 行为。

每个元素最多移动两次, 所以 operations 是 amortized O(1)。


9️⃣ Stack Using Queues


Problem

使用 queues 实现 stack operations:

push
pop
top
empty

One Queue Approach

Push 新元素后, 旋转 queue, 让 newest element 到 front。


Code

from collections import deque

class MyStack:
    def __init__(self):
        self.queue = deque()

    def push(self, x):
        self.queue.append(x)

        for _ in range(len(self.queue) - 1):
            self.queue.append(self.queue.popleft())

    def pop(self):
        return self.queue.popleft()

    def top(self):
        return self.queue[0]

    def empty(self):
        return not self.queue

👉 面试回答

Stack using one queue 的核心是在 push 后旋转 queue。

插入新值后, 把之前的元素移动到它后面。

这样 newest element 会出现在 front, pop 和 top 就表现得像 stack。


🔟 LRU Cache


Problem

设计 cache 支持:

get(key)
put(key, value)

都要求 O(1)。

当超过 capacity 时, 移除 least recently used key。


Required Structures

使用:

HashMap: key -> node
Doubly linked list: usage order

Linked list order:

head = most recently used
tail = least recently used

👉 面试回答

LRU Cache 需要 O(1) lookup, 也需要 O(1) 更新 usage order。

HashMap 可以从 key O(1) 找到 node。

Doubly linked list 可以 O(1) remove 和 insert。

每次 get 或 put, 都把访问过的 node 移到 front。

超过 capacity 时, 删除 tail 位置的 node。


1️⃣1️⃣ LRU Cache Code


Code

class Node:
    def __init__(self, key=0, value=0):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None


class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}

        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        prev_node = node.prev
        next_node = node.next

        prev_node.next = next_node
        next_node.prev = prev_node

    def _add_to_front(self, node):
        first = self.head.next

        node.prev = self.head
        node.next = first

        self.head.next = node
        first.prev = node

    def get(self, key):
        if key not in self.cache:
            return -1

        node = self.cache[key]
        self._remove(node)
        self._add_to_front(node)

        return node.value

    def put(self, key, value):
        if key in self.cache:
            node = self.cache[key]
            node.value = value

            self._remove(node)
            self._add_to_front(node)
            return

        node = Node(key, value)
        self.cache[key] = node
        self._add_to_front(node)

        if len(self.cache) > self.capacity:
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]

👉 面试回答

在 LRU Cache 中, HashMap 把 key 映射到 linked-list node。

Linked list 保存 recency order。

每次 key 被访问, 我把它从当前位置移除, 再加入 front。

如果 cache 超过 capacity, 删除 tail 前面的 node, 也就是 least recently used node。


1️⃣2️⃣ Insert Delete GetRandom O(1)


Problem

支持:

insert(val)
remove(val)
getRandom()

全部 average O(1)。


Required Structures

使用:

array for random access
HashMap value -> index

Remove Trick

为了 O(1) 删除 value:

和最后一个 element 交换
pop 最后一个 element
更新被移动 element 的 index

Code

import random

class RandomizedSet:
    def __init__(self):
        self.values = []
        self.index = {}

    def insert(self, val):
        if val in self.index:
            return False

        self.index[val] = len(self.values)
        self.values.append(val)

        return True

    def remove(self, val):
        if val not in self.index:
            return False

        remove_index = self.index[val]
        last_value = self.values[-1]

        self.values[remove_index] = last_value
        self.index[last_value] = remove_index

        self.values.pop()
        del self.index[val]

        return True

    def getRandom(self):
        return random.choice(self.values)

👉 面试回答

RandomizedSet 组合 array 和 HashMap。

Array 支持 O(1) random access。

HashMap 把每个 value 映射到它的 index。

删除时, 我把要删除的元素和最后一个元素交换, 然后 pop 最后一个, 并更新被移动元素的 index。


1️⃣3️⃣ Time-Based Key-Value Store


Problem

支持:

set(key, value, timestamp)
get(key, timestamp)

返回 timestamp 小于等于 query timestamp 的最大 timestamp 对应的 value。


Required Structures

使用:

HashMap key -> list of (timestamp, value)

因为 timestamps 是递增插入, 每个 list 是 sorted。

Get 时使用 binary search。


Code

from collections import defaultdict
import bisect

class TimeMap:
    def __init__(self):
        self.store = defaultdict(list)

    def set(self, key, value, timestamp):
        self.store[key].append((timestamp, value))

    def get(self, key, timestamp):
        arr = self.store[key]

        index = bisect.bisect_right(arr, (timestamp, chr(255)))

        if index == 0:
            return ""

        return arr[index - 1][1]

👉 面试回答

TimeMap 使用 HashMap, 每个 key 对应一个 sorted list of timestamp-value pairs。

因为 timestamps 按递增顺序插入, set 是 O(1)。

get 时, 我 binary search 找到小于等于 query timestamp 的最大 timestamp。

所以每个 key 的查询是 O(log n)。


1️⃣4️⃣ Median Finder


Problem

设计结构支持:

addNum(num)
findMedian()

Required Structures

使用两个 heaps:

max heap for smaller half
min heap for larger half

Python 中, max heap 用 negative values 模拟。


Invariant

保持:

len(small) == len(large)
or
len(small) == len(large) + 1

并且:

all values in small <= all values in large

Code

import heapq

class MedianFinder:
    def __init__(self):
        self.small = []
        self.large = []

    def addNum(self, num):
        heapq.heappush(self.small, -num)

        if self.large and -self.small[0] > self.large[0]:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.small) > len(self.large) + 1:
            value = -heapq.heappop(self.small)
            heapq.heappush(self.large, value)

        if len(self.large) > len(self.small):
            value = heapq.heappop(self.large)
            heapq.heappush(self.small, -value)

    def findMedian(self):
        if len(self.small) > len(self.large):
            return -self.small[0]

        return (-self.small[0] + self.large[0]) / 2

👉 面试回答

Median Finder 使用 two heaps。

Max heap 存 smaller half, min heap 存 larger half。

我会 rebalance, 保证两个 heaps 的 size 最多差 1。

如果总数是 odd, median 是较大一边的 heap top。

如果是 even, median 是两个 heap tops 的平均值。


1️⃣5️⃣ Design Trie


Problem

支持:

insert(word)
search(word)
startsWith(prefix)

Required Structure

使用 Trie nodes。

每个 node 存:

children map
is_word flag

Code

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root

        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()

            node = node.children[ch]

        node.is_word = True

    def search(self, word):
        node = self._find_node(word)

        return node is not None and node.is_word

    def startsWith(self, prefix):
        return self._find_node(prefix) is not None

    def _find_node(self, s):
        node = self.root

        for ch in s:
            if ch not in node.children:
                return None

            node = node.children[ch]

        return node

👉 面试回答

Trie 适合 prefix-based string operations。

每个 node 代表一个 prefix。

Insert 会沿着 characters 往下走, 并创建不存在的 nodes。

Search 只有在最后 node 标记为 complete word 时才返回 true。

startsWith 只需要 prefix path 存在。


1️⃣6️⃣ Word Dictionary


Problem

支持:

addWord(word)
search(word)

Search 可以包含 wildcard:

.

其中 . 可以匹配任意 character。


Required Structure

使用 Trie + DFS。

当 character 是 . 时:

try all children

Code

class WordDictionary:
    def __init__(self):
        self.root = TrieNode()

    def addWord(self, word):
        node = self.root

        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()

            node = node.children[ch]

        node.is_word = True

    def search(self, word):
        def dfs(index, node):
            if index == len(word):
                return node.is_word

            ch = word[index]

            if ch == ".":
                for child in node.children.values():
                    if dfs(index + 1, child):
                        return True

                return False

            if ch not in node.children:
                return False

            return dfs(index + 1, node.children[ch])

        return dfs(0, self.root)

👉 面试回答

WordDictionary 是 Trie 加 wildcard DFS。

普通 character 只沿着一个 child path 走。

当 character 是 . 时, 我会尝试当前 node 的所有 children。

只有在最后位置落在 complete word 上, 才返回 true。


1️⃣7️⃣ Hit Counter


Problem

统计过去 5 分钟的 hits。

支持:

hit(timestamp)
getHits(timestamp)

假设 5 分钟是:

300 seconds

Queue Approach

用 queue 存 timestamps。

移除早于:

timestamp - 300

的 timestamps。


Code

from collections import deque

class HitCounter:
    def __init__(self):
        self.queue = deque()

    def hit(self, timestamp):
        self.queue.append(timestamp)

    def getHits(self, timestamp):
        while self.queue and self.queue[0] <= timestamp - 300:
            self.queue.popleft()

        return len(self.queue)

👉 面试回答

Hit Counter 可以用 queue 实现。

每个 hit timestamp append 到 queue。

查询时, 移除不在最近 300 秒窗口内的 timestamps。

剩下的 queue size 就是有效 hits 数量。


1️⃣8️⃣ Moving Average From Data Stream


Problem

维护最近 size 个 values 的 average。


Required Structures

使用:

queue
running sum

Code

from collections import deque

class MovingAverage:
    def __init__(self, size):
        self.size = size
        self.queue = deque()
        self.total = 0

    def next(self, val):
        self.queue.append(val)
        self.total += val

        if len(self.queue) > self.size:
            self.total -= self.queue.popleft()

        return self.total / len(self.queue)

👉 面试回答

Moving Average 使用 queue 和 running sum。

Queue 存当前 window。

当 window 超过目标 size 时, 移除最旧的 value, 并从 sum 中减掉。

这样每次 update 是 O(1)。


1️⃣9️⃣ Snapshot Array


Problem

支持:

set(index, val)
snap()
get(index, snap_id)

Required Structure

对每个 index, 保存 history:

list of (snap_id, value)

Get 时使用 binary search。


Code

import bisect

class SnapshotArray:
    def __init__(self, length):
        self.snap_id = 0
        self.history = [[(0, 0)] for _ in range(length)]

    def set(self, index, val):
        arr = self.history[index]

        if arr[-1][0] == self.snap_id:
            arr[-1] = (self.snap_id, val)
        else:
            arr.append((self.snap_id, val))

    def snap(self):
        self.snap_id += 1
        return self.snap_id - 1

    def get(self, index, snap_id):
        arr = self.history[index]

        i = bisect.bisect_right(arr, (snap_id, float("inf")))

        return arr[i - 1][1]

👉 面试回答

SnapshotArray 对每个 index 保存 version history。

我不会每次 snapshot 都复制整个 array, 而是只记录 changed values。

get 时, binary search 找到 snap_id 小于等于 requested snap_id 的最新 value。

这样 snapshot 很便宜, lookup 是 logarithmic。


2️⃣0️⃣ LFU Cache


Problem

设计 cache 支持:

get(key)
put(key, value)

当超过 capacity 时, evict least frequently used key。

如果 frequency 相同, evict 其中 least recently used 的 key。


Required Structures

使用:

key -> node
frequency -> ordered dictionary of keys
min_frequency

Python 中, OrderedDict 可以维护同 frequency 内的 recency。


Code

from collections import defaultdict, OrderedDict

class LFUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.key_to_val_freq = {}
        self.freq_to_keys = defaultdict(OrderedDict)
        self.min_freq = 0

    def _update_freq(self, key):
        value, freq = self.key_to_val_freq[key]

        del self.freq_to_keys[freq][key]

        if not self.freq_to_keys[freq]:
            del self.freq_to_keys[freq]

            if self.min_freq == freq:
                self.min_freq += 1

        new_freq = freq + 1
        self.freq_to_keys[new_freq][key] = None
        self.key_to_val_freq[key] = (value, new_freq)

    def get(self, key):
        if key not in self.key_to_val_freq:
            return -1

        self._update_freq(key)
        return self.key_to_val_freq[key][0]

    def put(self, key, value):
        if self.capacity == 0:
            return

        if key in self.key_to_val_freq:
            self.key_to_val_freq[key] = (value, self.key_to_val_freq[key][1])
            self._update_freq(key)
            return

        if len(self.key_to_val_freq) >= self.capacity:
            evict_key, _ = self.freq_to_keys[self.min_freq].popitem(last=False)
            del self.key_to_val_freq[evict_key]

        self.key_to_val_freq[key] = (value, 1)
        self.freq_to_keys[1][key] = None
        self.min_freq = 1

👉 面试回答

LFU Cache 需要同时追踪 frequency 和 recency。

我用一个 map 保存 key 到 value 和 frequency。

再用一个 map 保存 frequency 到 ordered set of keys。

min_freq 告诉我应该从哪个 frequency bucket evict。

在这个 bucket 内, evict least recently used key。


2️⃣1️⃣ All O(1) Data Structure


Problem

支持:

inc(key)
dec(key)
getMaxKey()
getMinKey()

全部 O(1)。


High-Level Idea

使用按 count 排序的 buckets。

每个 bucket 存同样 count 的 keys。

需要:

key -> bucket
doubly linked list of count buckets

Conceptual Structure

head <-> count 1 <-> count 2 <-> count 5 <-> tail

每个 bucket 有:

count
set of keys
prev
next

👉 面试回答

All O(1) Data Structure 使用 count buckets 的 doubly linked list。

每个 bucket 保存相同 count 的所有 keys。

HashMap 把 key 映射到当前 bucket。

Increment 和 decrement 会把 key 移动到相邻 count bucket。

Head 侧可以得到 minimum key, tail 侧可以得到 maximum key。


2️⃣2️⃣ Design Patterns by Requirement


Requirement Mapping

Requirement Common Structure
O(1) lookup by key HashMap
O(1) random element Array
O(1) remove arbitrary element HashMap + array swap
O(1) reorder by recency HashMap + doubly linked list
Get min/max quickly Heap
Maintain median Two heaps
Prefix search Trie
Timestamp query HashMap + sorted list + binary search
Version history List of changes + binary search
Sliding window count Queue + running state

👉 面试回答

我会把 requirements 映射到 structures。

O(1) lookup 通常意味着 HashMap。

Random selection 通常意味着 array。

Array 中 O(1) deletion 需要 index mapping 和 swap-with-last。

Recency ordering 通常需要 doubly linked list。

Median 需要 two heaps。

Prefix search 需要 Trie。

Timestamp query 通常需要 sorted history 和 binary search。


2️⃣3️⃣ Complexity Analysis


Always State Complexity Per Operation

对 design problems, 不要只说整体复杂度。

要按 method 说:

get: O(1)
put: O(1)
remove: O(1)
findMedian: O(1)
addNum: O(log n)

Also State Space

通常:

Space: O(n)

其中 n 是 stored items 数量。

对 Trie:

Space: O(total characters)

对 SnapshotArray:

Space: O(number of set calls)

👉 面试回答

对 data structure design, 我会按 operation 分析复杂度。

每个 public method 都应该有明确的 time complexity。

Space complexity 要根据 stored elements、 total characters 或 total versions 来分析。


2️⃣4️⃣ 常见 Edge Cases


Empty Structure

Examples:

pop from empty stack
get from empty cache
getRandom from empty set

通常题目会保证 valid calls, 但如果不保证就要处理。


Existing Key

对 cache 或 map:

put existing key should update value

Capacity Zero

Cache problems 中很重要。

if capacity == 0:
    return

Duplicate Values

要明确:

set disallows duplicates
multiset allows duplicates

Timestamp Boundaries

对 TimeMap:

query before first timestamp
query exactly at timestamp
query after latest timestamp

Recency Updates

对 LRU:

get also updates recency
put existing key also updates recency

👉 面试回答

常见 edge cases 包括 empty structures、 更新 existing keys、 zero capacity、 duplicate values、 timestamp boundary queries, 以及 reads 是否更新 recency。

Design problems 中, 很多 edge cases 本质上都是在考每次操作后内部状态是否仍然一致。


2️⃣5️⃣ 常见错误


Mistake 1: 只用一个 Data Structure

很多 design problems 需要组合结构。

Example:

LRU needs both HashMap and linked list.

Mistake 2: 忘记同时更新多个 Structures

如果使用 HashMap + array, 每次 insert/delete 都要更新两个结构。


Mistake 3: 没处理 Existing Keys

对 cache:

put existing key should update value and recency/frequency.

Mistake 4: Eviction Rule 错误

LRU evicts least recently used。

LFU evicts least frequently used, tie 时 evict least recently used。


Mistake 5: 没维护 Invariants

Example:

MedianFinder heaps must stay balanced.

Mistake 6: 忽略 Amortized Complexity

Queue using stacks 是 amortized O(1), 不是每次 pop 都 worst-case O(1)。


👉 面试回答

常见错误包括: 试图只用一个 structure 解决所有问题、 忘记更新所有内部 structures、 mishandle existing keys、 使用错误 eviction rule、 破坏 invariants, 以及混淆 amortized complexity 和 worst-case complexity。


2️⃣6️⃣ 什么时候 Design Data Structures 适用?


Good Fit

当题目要求:

Design a class
Implement operations
Support get/put/add/remove
O(1) operations
O(log n) operations
Data stream operations
Cache behavior
Versioned array
Randomized structure
Prefix dictionary

Problem Signals

看到这些关键词想到 data structure design:

design
implement
class
constructor
method
support
data structure
cache
stream
random
timestamp
snapshot
prefix

👉 面试回答

当 prompt 要求实现一个包含多个 operations 的 class 时, 我会识别为 design data structure problem。

关键不是只让 operation correct, 还要满足要求的 time complexity。

我通常通过组合基础 data structures, 并维护清晰 invariants 来解决。


2️⃣7️⃣ 什么时候 Design Data Structures 不适用?


Not Good Fit

这个 pattern 不一定适合:

problem asks for one-time computation
problem is mainly graph traversal
problem is pure dynamic programming
problem asks for sorting or searching only
problem has no persistent state

Better Alternatives

Problem Type Better Pattern
One-time array scan Two Pointers / Sliding Window
Shortest path BFS / Dijkstra
Dependency ordering Topological Sort
Optimization over states Dynamic Programming
Range query with updates Segment Tree / Fenwick Tree
String matching KMP / Rolling Hash
Connectivity Union Find

👉 面试回答

Data structure design 最适合有 persistent state 和 multiple operations 的问题。

如果题目只是一次性计算, 那么 sliding window、DP、graph traversal、 sorting 之类的标准 algorithmic pattern 可能更合适。


2️⃣8️⃣ 标准模板


HashMap + Array Template

class Structure:
    def __init__(self):
        self.arr = []
        self.index = {}

    def add(self, val):
        if val in self.index:
            return False

        self.index[val] = len(self.arr)
        self.arr.append(val)
        return True

    def remove(self, val):
        if val not in self.index:
            return False

        i = self.index[val]
        last = self.arr[-1]

        self.arr[i] = last
        self.index[last] = i

        self.arr.pop()
        del self.index[val]

        return True

Doubly Linked List Node Template

class Node:
    def __init__(self, key=0, value=0):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

LRU Helper Template

def remove(node):
    node.prev.next = node.next
    node.next.prev = node.prev

def add_to_front(node):
    first = head.next

    node.prev = head
    node.next = first

    head.next = node
    first.prev = node

Two Heaps Template

import heapq

small = []  # max heap using negative values
large = []  # min heap

def add(num):
    heapq.heappush(small, -num)

    if large and -small[0] > large[0]:
        heapq.heappush(large, -heapq.heappop(small))

    if len(small) > len(large) + 1:
        heapq.heappush(large, -heapq.heappop(small))

    if len(large) > len(small):
        heapq.heappush(small, -heapq.heappop(large))

Timestamp Binary Search Template

import bisect

def get_value(history, timestamp):
    i = bisect.bisect_right(history, (timestamp, chr(255)))

    if i == 0:
        return ""

    return history[i - 1][1]

Trie Template

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

🧠 Staff-Level Answer Final


👉 面试回答完整版本

Design data structure problems 要求我们实现一个 class, 并让它高效支持一组 operations。

我的第一步是列出每个 operation, 并明确它的 required time complexity。 然后我把每个 requirement 映射到底层 data structure。

如果需要 O(1) key lookup, 通常需要 HashMap。 如果需要 O(1) random access, 通常需要 array。 如果需要在 array 中 O(1) 删除任意元素, 就需要 array 加 HashMap from value to index, 并使用 swap-with-last trick。

如果需要 recency ordering, 例如 LRU Cache, 我会组合 HashMap 和 doubly linked list。 HashMap 让 key 到 node 是 O(1), linked list 让 remove 和 move-to-front 是 O(1)。

如果需要维护 data stream median, 我会使用 two heaps: max heap 存 smaller half, min heap 存 larger half。 保持两个 heaps 平衡后, median 总能从 heap tops 得到。

如果需要 timestamp-based lookup, 我会用 HashMap, 把 key 映射到 sorted list of timestamp-value pairs, query 时使用 binary search。

如果需要 prefix search, 我会使用 Trie。 每个 Trie node 代表一个 prefix, Trie 中的一条 path 代表一个 word 或 prefix。

对 cache problems, eviction rule 非常关键。 LRU 删除 least recently used item。 LFU 删除 least frequently used item, 如果 frequency 相同, 删除该 frequency 中 least recently used item。

Data structure design 最重要的是维护 invariants。 例如 LRU 中, linked list order 必须永远等于 recency order。 MedianFinder 中, two heaps 必须保持平衡。 RandomizedSet 中, array 和 index map 必须永远一致。

高级工程师解释 design data structures 时, 我会讲清楚: required operations 是什么, 每个 operation 的 complexity 是什么, 内部使用哪些 data structures, 每个 structure 维护什么 invariant, 每个 method 如何更新所有内部状态, 以及哪些 edge cases 可能破坏设计。


⭐ Final Insight

Design Data Structures 的核心不是“会用 HashMap”。

它的核心是把 operation requirements 映射到 data structure combination。

Senior-level 的重点是:

每个 method 的复杂度是多少, 为什么单个 structure 不够, HashMap 负责什么, array / heap / linked list / Trie 负责什么, 每次操作后 invariant 是否仍然成立, existing key、capacity zero、duplicate values、timestamp boundary 怎么处理, 以及 amortized O(1) 和 worst-case O(1) 的区别。

能讲清楚这些, Design Data Structures 就是一类处理 custom class、multi-operation state、cache、stream、random access 和 prefix lookup 的核心模式。

Implement