How to Guarantee Idempotency in Distributed Systems

Post by ailswan Mar. 6

中文 ↓

🎯 Problem Background

In distributed systems, network retries, message duplication, and partial failures are common.

For example:

Without proper design, these retries may cause duplicate operations, such as:

Therefore, distributed systems must ensure idempotency — meaning that multiple executions of the same request produce the same result as executing it once.


1️⃣ Idempotency Keys (Request Deduplication)

Core Idea

Each client request includes a unique idempotency key.

The server stores processed keys and ignores duplicates.


Example Scenario — Payment API

POST /payment
Idempotency-Key: 9cfa92f1

Server workflow:

  1. Check if the key exists in the database
  2. If not → process payment
  3. Save the key and result
  4. If exists → return previous result

Data Model

idempotency_table

idempotency_key
request_hash
response_payload
status
created_at

Benefits


Trade-offs


For client-driven APIs like payment systems, idempotency keys are the most reliable way to prevent duplicate operations caused by retries or network failures.


2️⃣ Database Constraints (Natural Idempotency)

Another way to guarantee idempotency is to enforce uniqueness at the database level.


Example — Order Creation

orders table

order_id
user_id
product_id
status

Add a unique constraint:

UNIQUE(user_id, request_id)

Even if the request is processed twice, the second insert fails.


Benefits


Limitations


Database uniqueness constraints provide a simple and strong form of idempotency when duplicate writes must be prevented.


3️⃣ Event Processing Deduplication

In event-driven systems, duplicate events are common.

Example sources:


Solution — Processed Event Table

Store processed event IDs.

processed_events

event_id
processed_at

Processing workflow:

if event_id exists:
    skip processing
else:
    process event
    insert event_id

Example Use Case

Ad spend aggregation system:

event_id
campaign_id
spend_amount
timestamp

The system records processed event IDs to avoid double counting ad spend.


This pattern is widely used in streaming systems to ensure correctness when using at-least-once delivery guarantees.


4️⃣ Idempotent Operations (State-Based Updates)

Instead of preventing duplicates, another strategy is designing operations that are naturally idempotent.


Example

Instead of:

balance = balance - 10

Use:

set balance = 90

Or use versioned updates:

update account
set balance = 90
where version = 3

Benefits


Trade-offs


Designing operations to be naturally idempotent is often the most robust approach, especially in distributed state machines.


🧠 Senior / Staff-Level Summary Answer

In distributed systems, retries and duplicate messages are unavoidable, so systems must be designed to guarantee idempotency. I typically use several strategies depending on the scenario:

  • For client APIs such as payment services, I use idempotency keys to deduplicate requests.
  • For database writes, I rely on unique constraints to prevent duplicate records.
  • For event-driven architectures, I store processed event IDs to handle at-least-once message delivery.
  • In some cases, I redesign operations to be naturally idempotent, such as state-based updates instead of incremental updates.

In practice, large-scale systems often combine multiple strategies to ensure correctness across retries, failures, and distributed execution.


⭐ Staff-Level Insight (Bonus)

Idempotency is not just a retry safety feature — it is a core correctness guarantee in distributed systems. Any system that relies on retries without idempotency risks data corruption, double charges, and inconsistent state.


中文部分

🎯 问题背景

在分布式系统中,网络重试、消息重复、部分失败是常见情况。

例如:

如果系统没有做好设计,就可能出现:

因此系统必须保证 幂等性(Idempotency)

同一个请求执行多次,其结果与执行一次相同。


1️⃣ 幂等 Key(请求去重)

客户端为每个请求生成唯一 Idempotency Key

服务器记录已经处理过的 Key。


支付系统示例

POST /payment
Idempotency-Key: 9cfa92f1

服务器流程:

  1. 查询数据库是否存在该 key
  2. 不存在 → 执行支付
  3. 保存 key 和返回结果
  4. 如果存在 → 返回之前的结果

2️⃣ 数据库唯一约束

通过数据库唯一索引防止重复写入。

例如订单系统:

UNIQUE(user_id, request_id)

即使请求执行两次,数据库也只允许插入一次。


3️⃣ 事件处理去重

在 Kafka / streaming 系统中:

建立

processed_events table

存储已处理的 event_id。

消费流程:

如果 event_id 已存在 → 跳过
否则 → 处理并记录

4️⃣ 设计天然幂等操作

例如:

不要

balance = balance - 10

而是

set balance = 90

或使用 版本控制更新


🧠 Senior / Staff 面试总结

在分布式系统中,网络重试和消息重复是不可避免的,因此系统必须保证幂等性。我通常根据不同场景采用不同策略:

  • API 请求使用 Idempotency Key
  • 数据库写入使用 唯一约束
  • 事件流处理使用 processed event table
  • 在可能情况下设计 天然幂等操作

实际系统通常会组合多种策略来保证系统正确性。


Implement