🎯 Problem Background
In distributed systems, network retries, message duplication, and partial failures are common.
For example:
- A payment request times out, so the client retries.
- A Kafka consumer restarts and reprocesses the same event.
- A service retries downstream API calls after a timeout.
Without proper design, these retries may cause duplicate operations, such as:
- Charging a customer twice
- Deducting inventory multiple times
- Sending the same notification repeatedly
Therefore, distributed systems must ensure idempotency — meaning that multiple executions of the same request produce the same result as executing it once.
1️⃣ Idempotency Keys (Request Deduplication)
Core Idea
Each client request includes a unique idempotency key.
The server stores processed keys and ignores duplicates.
Example Scenario — Payment API
POST /payment
Idempotency-Key: 9cfa92f1
Server workflow:
- Check if the key exists in the database
- If not → process payment
- Save the key and result
- If exists → return previous result
Data Model
idempotency_table
idempotency_key
request_hash
response_payload
status
created_at
Benefits
- Prevents duplicate operations
- Safe client retries
- Common in Stripe / PayPal APIs
Trade-offs
- Requires additional storage
- Key expiration strategy needed
For client-driven APIs like payment systems, idempotency keys are the most reliable way to prevent duplicate operations caused by retries or network failures.
2️⃣ Database Constraints (Natural Idempotency)
Another way to guarantee idempotency is to enforce uniqueness at the database level.
Example — Order Creation
orders table
order_id
user_id
product_id
status
Add a unique constraint:
UNIQUE(user_id, request_id)
Even if the request is processed twice, the second insert fails.
Benefits
- Simple
- Strong guarantee
- No extra logic needed
Limitations
- Only works for create operations
- Not suitable for complex workflows
Database uniqueness constraints provide a simple and strong form of idempotency when duplicate writes must be prevented.
3️⃣ Event Processing Deduplication
In event-driven systems, duplicate events are common.
Example sources:
- Kafka at-least-once delivery
- Consumer restarts
- Message redelivery
Solution — Processed Event Table
Store processed event IDs.
processed_events
event_id
processed_at
Processing workflow:
if event_id exists:
skip processing
else:
process event
insert event_id
Example Use Case
Ad spend aggregation system:
event_id
campaign_id
spend_amount
timestamp
The system records processed event IDs to avoid double counting ad spend.
This pattern is widely used in streaming systems to ensure correctness when using at-least-once delivery guarantees.
4️⃣ Idempotent Operations (State-Based Updates)
Instead of preventing duplicates, another strategy is designing operations that are naturally idempotent.
Example
Instead of:
balance = balance - 10
Use:
set balance = 90
Or use versioned updates:
update account
set balance = 90
where version = 3
Benefits
- No deduplication storage needed
- Safe retries
Trade-offs
- Harder to implement
- Not always applicable
Designing operations to be naturally idempotent is often the most robust approach, especially in distributed state machines.
🧠 Senior / Staff-Level Summary Answer
In distributed systems, retries and duplicate messages are unavoidable, so systems must be designed to guarantee idempotency. I typically use several strategies depending on the scenario:
- For client APIs such as payment services, I use idempotency keys to deduplicate requests.
- For database writes, I rely on unique constraints to prevent duplicate records.
- For event-driven architectures, I store processed event IDs to handle at-least-once message delivery.
- In some cases, I redesign operations to be naturally idempotent, such as state-based updates instead of incremental updates.
In practice, large-scale systems often combine multiple strategies to ensure correctness across retries, failures, and distributed execution.
⭐ Staff-Level Insight (Bonus)
Idempotency is not just a retry safety feature — it is a core correctness guarantee in distributed systems. Any system that relies on retries without idempotency risks data corruption, double charges, and inconsistent state.
中文部分
🎯 问题背景
在分布式系统中,网络重试、消息重复、部分失败是常见情况。
例如:
- 支付请求超时,客户端 自动重试
- Kafka consumer 重新消费同一条消息
- 服务调用下游 API 自动 retry
如果系统没有做好设计,就可能出现:
- 用户被 重复扣费
- 库存被 重复扣减
- 通知被 重复发送
因此系统必须保证 幂等性(Idempotency):
同一个请求执行多次,其结果与执行一次相同。
1️⃣ 幂等 Key(请求去重)
客户端为每个请求生成唯一 Idempotency Key。
服务器记录已经处理过的 Key。
支付系统示例
POST /payment
Idempotency-Key: 9cfa92f1
服务器流程:
- 查询数据库是否存在该 key
- 不存在 → 执行支付
- 保存 key 和返回结果
- 如果存在 → 返回之前的结果
2️⃣ 数据库唯一约束
通过数据库唯一索引防止重复写入。
例如订单系统:
UNIQUE(user_id, request_id)
即使请求执行两次,数据库也只允许插入一次。
3️⃣ 事件处理去重
在 Kafka / streaming 系统中:
建立
processed_events table
存储已处理的 event_id。
消费流程:
如果 event_id 已存在 → 跳过
否则 → 处理并记录
4️⃣ 设计天然幂等操作
例如:
不要
balance = balance - 10
而是
set balance = 90
或使用 版本控制更新。
🧠 Senior / Staff 面试总结
在分布式系统中,网络重试和消息重复是不可避免的,因此系统必须保证幂等性。我通常根据不同场景采用不同策略:
- API 请求使用 Idempotency Key
- 数据库写入使用 唯一约束
- 事件流处理使用 processed event table
- 在可能情况下设计 天然幂等操作
实际系统通常会组合多种策略来保证系统正确性。
Implement