Q&A-SystemDesign ·

🎯 Problem Background

In distributed systems, network retries, message duplication, and partial failures are common.

For example:

A payment request times out, so the client retries.
A Kafka consumer restarts and reprocesses the same event.
A service retries downstream API calls after a timeout.

Without proper design, these retries may cause duplicate operations, such as:

Charging a customer twice
Deducting inventory multiple times
Sending the same notification repeatedly

Therefore, distributed systems must ensure idempotency — meaning that multiple executions of the same request produce the same result as executing it once.

1️⃣ Idempotency Keys (Request Deduplication)

Core Idea

Each client request includes a unique idempotency key.

The server stores processed keys and ignores duplicates.

Example Scenario — Payment API

POST /payment
Idempotency-Key: 9cfa92f1

Server workflow:

Check if the key exists in the database
If not → process payment
Save the key and result
If exists → return previous result

Data Model

idempotency_table

idempotency_key
request_hash
response_payload
status
created_at

Benefits

Prevents duplicate operations
Safe client retries
Common in Stripe / PayPal APIs

Trade-offs

Requires additional storage
Key expiration strategy needed

For client-driven APIs like payment systems, idempotency keys are the most reliable way to prevent duplicate operations caused by retries or network failures.

2️⃣ Database Constraints (Natural Idempotency)

Another way to guarantee idempotency is to enforce uniqueness at the database level.

Example — Order Creation

orders table

order_id
user_id
product_id
status

Add a unique constraint:

UNIQUE(user_id, request_id)

Even if the request is processed twice, the second insert fails.

Benefits

Simple
Strong guarantee
No extra logic needed

Limitations

Only works for create operations
Not suitable for complex workflows

Database uniqueness constraints provide a simple and strong form of idempotency when duplicate writes must be prevented.

3️⃣ Event Processing Deduplication

In event-driven systems, duplicate events are common.

Example sources:

Kafka at-least-once delivery
Consumer restarts
Message redelivery

Solution — Processed Event Table

Store processed event IDs.

processed_events

event_id
processed_at

Processing workflow:

if event_id exists:
    skip processing
else:
    process event
    insert event_id

Example Use Case

Ad spend aggregation system:

event_id
campaign_id
spend_amount
timestamp

The system records processed event IDs to avoid double counting ad spend.

This pattern is widely used in streaming systems to ensure correctness when using at-least-once delivery guarantees.

4️⃣ Idempotent Operations (State-Based Updates)

Instead of preventing duplicates, another strategy is designing operations that are naturally idempotent.

Example

Instead of:

balance = balance - 10

Use:

set balance = 90

Or use versioned updates:

update account
set balance = 90
where version = 3

Benefits

No deduplication storage needed
Safe retries

Trade-offs

Harder to implement
Not always applicable

Designing operations to be naturally idempotent is often the most robust approach, especially in distributed state machines.

🧠 Senior / Staff-Level Summary Answer

In distributed systems, retries and duplicate messages are unavoidable, so systems must be designed to guarantee idempotency. I typically use several strategies depending on the scenario:

For client APIs such as payment services, I use idempotency keys to deduplicate requests.

For database writes, I rely on unique constraints to prevent duplicate records.

For event-driven architectures, I store processed event IDs to handle at-least-once message delivery.

In some cases, I redesign operations to be naturally idempotent, such as state-based updates instead of incremental updates.

In practice, large-scale systems often combine multiple strategies to ensure correctness across retries, failures, and distributed execution.

⭐ Staff-Level Insight (Bonus)

Idempotency is not just a retry safety feature — it is a core correctness guarantee in distributed systems. Any system that relies on retries without idempotency risks data corruption, double charges, and inconsistent state.

中文部分

🎯 问题背景

在分布式系统中，网络重试、消息重复、部分失败是常见情况。

例如：

支付请求超时，客户端 自动重试
Kafka consumer 重新消费同一条消息
服务调用下游 API 自动 retry

如果系统没有做好设计，就可能出现：

用户被 重复扣费
库存被 重复扣减
通知被 重复发送

因此系统必须保证 幂等性（Idempotency）：

同一个请求执行多次，其结果与执行一次相同。

1️⃣ 幂等 Key（请求去重）

客户端为每个请求生成唯一 Idempotency Key。

服务器记录已经处理过的 Key。

支付系统示例

POST /payment
Idempotency-Key: 9cfa92f1

服务器流程：

查询数据库是否存在该 key
不存在 → 执行支付
保存 key 和返回结果
如果存在 → 返回之前的结果

2️⃣ 数据库唯一约束

通过数据库唯一索引防止重复写入。

例如订单系统：

UNIQUE(user_id, request_id)

即使请求执行两次，数据库也只允许插入一次。

3️⃣ 事件处理去重

在 Kafka / streaming 系统中：

建立

processed_events table

存储已处理的 event_id。

消费流程：

如果 event_id 已存在 → 跳过
否则 → 处理并记录

4️⃣ 设计天然幂等操作

例如：

不要

balance = balance - 10

而是

set balance = 90

或使用 版本控制更新。

🧠 Senior / Staff 面试总结

在分布式系统中，网络重试和消息重复是不可避免的，因此系统必须保证幂等性。我通常根据不同场景采用不同策略：

API 请求使用 Idempotency Key

数据库写入使用 唯一约束

事件流处理使用 processed event table

在可能情况下设计 天然幂等操作

实际系统通常会组合多种策略来保证系统正确性。

How to Guarantee Idempotency in Distributed Systems

🎯 Problem Background

1️⃣ Idempotency Keys (Request Deduplication)

Core Idea

Example Scenario — Payment API

Data Model

Benefits

Trade-offs

2️⃣ Database Constraints (Natural Idempotency)

Example — Order Creation

Benefits

Limitations

3️⃣ Event Processing Deduplication

Solution — Processed Event Table

Example Use Case

4️⃣ Idempotent Operations (State-Based Updates)

Example

Benefits

Trade-offs

🧠 Senior / Staff-Level Summary Answer

⭐ Staff-Level Insight (Bonus)

中文部分

🎯 问题背景

1️⃣ 幂等 Key（请求去重）

支付系统示例

2️⃣ 数据库唯一约束

3️⃣ 事件处理去重

4️⃣ 设计天然幂等操作

🧠 Senior / Staff 面试总结

Implement