System Design Deep Dive - 17 Design Payment System

Post by ailswan May. 10, 2026

中文 ↓

🎯 Design Payment System


1️⃣ Core Framework

When discussing Payment System design, I frame it as:

  1. Payment intent and checkout flow
  2. Authorization, capture, refund, and chargeback
  3. Payment state machine
  4. Idempotency and exactly-once effect
  5. Ledger and accounting correctness
  6. External payment provider integration
  7. Webhooks, reconciliation, and retries
  8. Security, fraud, compliance, and observability

2️⃣ Core Requirements


Functional Requirements


Non-functional Requirements


👉 Interview Answer

A payment system must move money correctly.

Unlike many user-facing systems, payment design prioritizes correctness, idempotency, auditability, and security over raw latency.

The main challenge is preventing duplicate charges, handling provider failures, maintaining accurate ledger records, and reconciling internal state with external payment providers.


3️⃣ Core Concepts


Payment Intent

A payment intent represents the user’s intent to pay.

Example:

order_id = o123
amount = $50
currency = USD
status = CREATED

Authorization

Authorization reserves money on the user’s payment method.

Can this user pay $50?

Capture

Capture actually collects the money.

Charge the authorized amount.

Refund

Refund sends money back to the user.


Chargeback

A chargeback is a dispute initiated by the cardholder or bank.


👉 Interview Answer

I would model payment as a stateful process.

Payment intent represents the user’s intent to pay. Authorization reserves funds, capture collects the funds, refund reverses money back to the user, and chargeback handles disputes.


4️⃣ Main APIs


Create Payment Intent

POST /api/payment-intents

Request:

{
  "orderId": "o123",
  "userId": "u456",
  "amount": 5000,
  "currency": "USD",
  "paymentMethodId": "pm789"
}

Response:

{
  "paymentIntentId": "pi_123",
  "status": "CREATED"
}

Authorize Payment

POST /api/payment-intents/{paymentIntentId}/authorize

Headers:

Idempotency-Key: auth-o123-v1

Capture Payment

POST /api/payment-intents/{paymentIntentId}/capture

Headers:

Idempotency-Key: capture-o123-v1

Refund Payment

POST /api/payments/{paymentId}/refund

Request:

{
  "amount": 5000,
  "reason": "customer_cancelled"
}

Get Payment Status

GET /api/payment-intents/{paymentIntentId}

👉 Interview Answer

I would expose APIs to create a payment intent, authorize payment, capture payment, refund payment, and retrieve payment status.

All mutation APIs must support idempotency keys, because clients and services may retry requests.


5️⃣ Data Model


Payment Intent Table

payment_intent (
  payment_intent_id VARCHAR PRIMARY KEY,
  order_id VARCHAR UNIQUE,
  user_id VARCHAR,
  amount BIGINT,
  currency VARCHAR,
  payment_method_id VARCHAR,
  status VARCHAR,
  idempotency_key VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Payment Transaction Table

payment_transaction (
  transaction_id VARCHAR PRIMARY KEY,
  payment_intent_id VARCHAR,
  transaction_type VARCHAR, -- authorize, capture, refund, void
  amount BIGINT,
  currency VARCHAR,
  status VARCHAR,
  provider VARCHAR,
  provider_transaction_id VARCHAR,
  idempotency_key VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Ledger Entry Table

ledger_entry (
  ledger_entry_id VARCHAR PRIMARY KEY,
  transaction_id VARCHAR,
  account_id VARCHAR,
  entry_type VARCHAR, -- debit or credit
  amount BIGINT,
  currency VARCHAR,
  created_at TIMESTAMP
)

Provider Event Table

provider_event (
  provider_event_id VARCHAR PRIMARY KEY,
  provider VARCHAR,
  event_type VARCHAR,
  provider_transaction_id VARCHAR,
  payload JSON,
  processed BOOLEAN,
  received_at TIMESTAMP
)

👉 Interview Answer

I would separate payment intent, payment transactions, provider events, and ledger entries.

Payment intent tracks the high-level payment lifecycle.

Transactions track individual operations like authorization, capture, refund, and void.

Ledger entries provide auditable accounting records.


6️⃣ High-Level Architecture


Client / Order Service
→ Payment API
→ Payment Orchestrator
→ Risk / Fraud Service
→ Payment Provider Adapter
→ External Payment Provider

Provider Webhook
→ Webhook Handler
→ Payment State Updater
→ Ledger Service
→ Reconciliation Jobs

Main Components

Payment API


Payment Orchestrator


Provider Adapter


Ledger Service


Webhook Handler


👉 Interview Answer

I would separate payment orchestration, provider integration, ledger recording, and webhook processing.

External providers are unreliable and asynchronous, so the system must handle retries, duplicate callbacks, delayed events, and reconciliation.


7️⃣ Payment State Machine


Common States

CREATED
AUTHORIZING
AUTHORIZED
AUTHORIZATION_FAILED
CAPTURING
CAPTURED
CAPTURE_FAILED
CANCELLED
REFUNDING
REFUNDED
CHARGEBACK

Example Flow

CREATED
→ AUTHORIZING
→ AUTHORIZED
→ CAPTURING
→ CAPTURED

Failure Flow

CREATED
→ AUTHORIZING
→ AUTHORIZATION_FAILED

Refund Flow

CAPTURED
→ REFUNDING
→ REFUNDED

👉 Interview Answer

I would model payment as a state machine.

This prevents invalid transitions, makes retries safer, and helps us reason about partial failures.

For example, capture should only happen after authorization succeeds.


8️⃣ Authorization and Capture Flow


Flow

Order Service creates order
→ Payment Intent created
→ Fraud check
→ Provider authorization request
→ Provider returns authorized / failed
→ Payment state updated
→ Order confirmed
→ Capture happens later or immediately

Auth + Capture Timing

Immediate Capture

Used for:


Delayed Capture

Used for:


👉 Interview Answer

Depending on the business, authorization and capture may happen together or separately.

For simple purchases, immediate capture is fine.

For services like ride sharing or food delivery, I would authorize first and capture later after the final amount is known.


9️⃣ Idempotency


Why Needed?

Clients may retry due to:

Without idempotency:

same request retried → duplicate charge

Idempotency Key

Example:

capture-order-o123-v1

Store:

idempotency_key → request hash → response

Behavior

If same key is used again:


👉 Interview Answer

Idempotency is essential in payment systems.

Every mutation operation, such as authorize, capture, refund, and cancel, should require an idempotency key.

The system stores the key, request hash, and response, so retries do not create duplicate charges.


🔟 Ledger Design


Why Ledger?

Payment status alone is not enough.

We need accounting records.


Double-entry Ledger

Every money movement creates at least two entries:

Debit one account
Credit another account

Example: Capture $50

Debit: Customer receivable account $50
Credit: Merchant payable account $50

Ledger Properties


👉 Interview Answer

I would use an append-only double-entry ledger to track money movement.

Payment state tells us where the workflow is, but ledger entries provide the accounting truth.

Ledger entries should be immutable and auditable.


1️⃣1️⃣ External Provider Integration


Provider Adapter Responsibilities


Provider Failures


👉 Interview Answer

External payment providers are asynchronous and unreliable.

I would isolate provider-specific logic inside provider adapters.

The adapter normalizes responses, maps errors, handles retries, and stores provider transaction IDs for reconciliation.


1️⃣2️⃣ Webhooks


Why Webhooks?

Payment providers often send final status asynchronously.

Examples:


Webhook Flow

Provider sends webhook
→ Verify signature
→ Store raw event
→ Deduplicate event
→ Process event
→ Update payment state
→ Write ledger if needed

Important Rules


👉 Interview Answer

Webhooks must be processed carefully.

I would verify provider signatures, store raw events durably, deduplicate provider event IDs, and process events idempotently.

Since webhooks can arrive late or out of order, state transitions must be validated.


1️⃣3️⃣ Reconciliation


Why Needed?

Internal state and provider state may diverge.

Examples:


Reconciliation Flow

Scheduled job
→ Query provider transactions
→ Compare with internal records
→ Find mismatches
→ Correct state
→ Generate audit report

Reconciliation Types


👉 Interview Answer

Reconciliation is critical in payment systems.

Even with retries and webhooks, internal state can diverge from provider state.

Periodic reconciliation compares internal records with provider reports and corrects mismatches.


1️⃣4️⃣ Refunds and Chargebacks


Refund Flow

User requests refund
→ Validate refund eligibility
→ Create refund transaction
→ Call provider refund API
→ Provider confirms or sends webhook
→ Update payment state
→ Write ledger entries

Partial Refund

Support:

refund amount < captured amount

Chargeback

Chargeback is initiated externally.

Bank / cardholder disputes transaction
→ Provider sends chargeback webhook
→ System marks payment disputed
→ Notify merchant/support

👉 Interview Answer

Refunds should be modeled as separate transactions, not just status changes.

Partial refunds must be supported.

Chargebacks are usually initiated externally, so the system must handle provider webhook events and update internal state accordingly.


1️⃣5️⃣ Fraud and Risk


Fraud Signals


Risk Flow

Payment request
→ Risk scoring
→ Allow / challenge / block
→ Payment authorization

Actions


👉 Interview Answer

Fraud detection should run before payment authorization.

The risk system can score the transaction based on device, IP, user history, amount, velocity, and behavioral signals.

High-risk payments can be blocked, challenged, or sent to manual review.


1️⃣6️⃣ Security and Compliance


PCI Compliance

Do not store raw card numbers unless necessary.

Prefer:

tokenized payment method

Security Requirements


Sensitive Data

Avoid storing:


👉 Interview Answer

Payment systems require strong security and compliance.

I would avoid storing raw card data and use tokenized payment methods from payment providers.

Sensitive data must be encrypted, access must be tightly controlled, and all payment operations should be audited.


1️⃣7️⃣ Scaling Patterns


Pattern 1: Separate Payment Orchestration and Ledger


Pattern 2: Provider Adapter Layer

Supports multiple providers.


Pattern 3: Event-driven Processing

payment state change
→ event bus
→ order update / notification / analytics

Pattern 4: Idempotency Store

Central store for mutation request deduplication.


Pattern 5: Reconciliation Jobs

Scheduled jobs compare internal and external states.


👉 Interview Answer

To scale and maintain payment correctness, I would separate orchestration, provider integration, ledger, and reconciliation.

Event-driven processing helps notify other systems, while idempotency and reconciliation ensure correctness.


1️⃣8️⃣ Failure Handling


Common Failures


Strategies


👉 Interview Answer

Payment systems must assume partial failures.

A provider call may time out even though the payment succeeded.

Therefore, every operation should be idempotent, state transitions should be explicit, webhooks should be processed safely, and reconciliation jobs should correct mismatches.


1️⃣9️⃣ Consistency Model


Stronger Consistency Needed For


Eventual Consistency Acceptable For


👉 Interview Answer

Payment state, ledger writes, and idempotency records require strong correctness.

But downstream updates such as receipts, notifications, analytics, and dashboards can be eventually consistent.


2️⃣0️⃣ Observability


Key Metrics


Logs and Tracing

Track:

payment_intent_id
order_id
transaction_id
provider_transaction_id
idempotency_key
state_transition

👉 Interview Answer

Observability is critical for payments.

I would monitor authorization success, capture success, refund success, provider errors, webhook lag, reconciliation mismatches, duplicate requests, fraud declines, and chargebacks.

Every transaction should be traceable end to end.


2️⃣1️⃣ End-to-End Flow


Payment Flow

User checks out
→ Create payment intent
→ Risk check
→ Authorize payment
→ Update payment state
→ Confirm order
→ Capture payment
→ Write ledger entries
→ Send receipt

Webhook Flow

Provider sends event
→ Verify signature
→ Store raw event
→ Deduplicate
→ Process event
→ Update transaction state
→ Write ledger if needed

Reconciliation Flow

Scheduled job runs
→ Fetch provider reports
→ Compare with internal transactions
→ Detect mismatches
→ Repair state
→ Generate audit report

Key Insight

Payment System is not just calling a payment API — it is a correctness-first financial workflow system.


🧠 Staff-Level Answer (Final)


👉 Interview Answer (Full Version)

When designing a payment system, I think of it as a correctness-first financial workflow system.

The system should model payment as a state machine with states such as created, authorizing, authorized, capturing, captured, refunding, refunded, failed, and chargeback.

I would start with a payment intent, which represents the user’s intent to pay for an order.

Depending on the business, the system may authorize and capture immediately, or authorize first and capture later.

Every mutation API, such as authorize, capture, refund, and cancel, must be idempotent to prevent duplicate charges or refunds.

I would separate payment intent, payment transactions, provider events, and ledger entries.

Payment state tracks workflow progress, while the append-only double-entry ledger provides the accounting truth.

External payment providers should be isolated behind provider adapters, because each provider has different APIs, errors, retry behavior, and webhook formats.

Webhooks must be verified, stored durably, deduplicated, and processed idempotently, because they may arrive late, duplicated, or out of order.

Reconciliation jobs are essential. They compare internal records with provider reports and repair mismatches caused by timeouts, lost webhooks, or partial failures.

Security and compliance are critical. I would avoid storing raw card data, use tokenized payment methods, encrypt sensitive data, enforce strict access control, and audit all payment operations.

The main trade-offs are correctness, availability, latency, provider dependency, operational complexity, and user experience.

Ultimately, the goal is to move money correctly, prevent duplicate charges, maintain auditable records, and recover safely from partial failures.


⭐ Final Insight

Payment System 的核心不是调用支付接口, 而是一个以正确性、幂等性、账本和对账为核心的金融状态机系统。



中文部分


🎯 Design Payment System


1️⃣ 核心框架

在设计 Payment System 时,我通常从以下几个方面来分析:

  1. Payment intent 和 checkout flow
  2. Authorization、capture、refund 和 chargeback
  3. Payment state machine
  4. Idempotency 和 exactly-once effect
  5. Ledger 和 accounting correctness
  6. 外部 payment provider integration
  7. Webhooks、reconciliation 和 retries
  8. Security、fraud、compliance 和 observability

2️⃣ 核心需求


功能需求


非功能需求


👉 面试回答

Payment System 的核心是正确地移动资金。

和很多用户系统不同, payment design 更优先考虑 correctness、idempotency、 auditability 和 security, 而不是单纯追求低延迟。

核心挑战是防止重复扣款、 处理 provider failure、 维护准确 ledger records, 并让内部状态和外部 payment provider 对账一致。


3️⃣ 核心概念


Payment Intent

Payment intent 表示用户的付款意图。

示例:

order_id = o123
amount = $50
currency = USD
status = CREATED

Authorization

Authorization 会在用户支付方式上预留金额。

Can this user pay $50?

Capture

Capture 才是真正收款。

Charge the authorized amount.

Refund

Refund 将钱退回给用户。


Chargeback

Chargeback 是持卡人或银行发起的争议。


👉 面试回答

我会将 payment 建模成一个有状态流程。

Payment intent 表示用户付款意图; authorization 预留资金; capture 真正收款; refund 将钱退回用户; chargeback 则处理外部争议。


4️⃣ 主要 API


Create Payment Intent

POST /api/payment-intents

Request:

{
  "orderId": "o123",
  "userId": "u456",
  "amount": 5000,
  "currency": "USD",
  "paymentMethodId": "pm789"
}

Response:

{
  "paymentIntentId": "pi_123",
  "status": "CREATED"
}

Authorize Payment

POST /api/payment-intents/{paymentIntentId}/authorize

Headers:

Idempotency-Key: auth-o123-v1

Capture Payment

POST /api/payment-intents/{paymentIntentId}/capture

Headers:

Idempotency-Key: capture-o123-v1

Refund Payment

POST /api/payments/{paymentId}/refund

Request:

{
  "amount": 5000,
  "reason": "customer_cancelled"
}

Get Payment Status

GET /api/payment-intents/{paymentIntentId}

👉 面试回答

我会提供 create payment intent、authorize payment、 capture payment、refund payment 和查询 payment status 的 API。

所有 mutation APIs 都必须支持 idempotency keys, 因为 client 和 service 都可能因为 timeout 或错误而重试。


5️⃣ 数据模型


Payment Intent Table

payment_intent (
  payment_intent_id VARCHAR PRIMARY KEY,
  order_id VARCHAR UNIQUE,
  user_id VARCHAR,
  amount BIGINT,
  currency VARCHAR,
  payment_method_id VARCHAR,
  status VARCHAR,
  idempotency_key VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Payment Transaction Table

payment_transaction (
  transaction_id VARCHAR PRIMARY KEY,
  payment_intent_id VARCHAR,
  transaction_type VARCHAR, -- authorize, capture, refund, void
  amount BIGINT,
  currency VARCHAR,
  status VARCHAR,
  provider VARCHAR,
  provider_transaction_id VARCHAR,
  idempotency_key VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Ledger Entry Table

ledger_entry (
  ledger_entry_id VARCHAR PRIMARY KEY,
  transaction_id VARCHAR,
  account_id VARCHAR,
  entry_type VARCHAR, -- debit or credit
  amount BIGINT,
  currency VARCHAR,
  created_at TIMESTAMP
)

Provider Event Table

provider_event (
  provider_event_id VARCHAR PRIMARY KEY,
  provider VARCHAR,
  event_type VARCHAR,
  provider_transaction_id VARCHAR,
  payload JSON,
  processed BOOLEAN,
  received_at TIMESTAMP
)

👉 面试回答

我会将 payment intent、payment transactions、 provider events 和 ledger entries 分开存储。

Payment intent 追踪高层支付生命周期。

Transactions 记录 authorize、capture、refund 和 void 等具体操作。

Ledger entries 提供可审计的 accounting records。


6️⃣ High-Level Architecture


Client / Order Service
→ Payment API
→ Payment Orchestrator
→ Risk / Fraud Service
→ Payment Provider Adapter
→ External Payment Provider

Provider Webhook
→ Webhook Handler
→ Payment State Updater
→ Ledger Service
→ Reconciliation Jobs

Main Components

Payment API


Payment Orchestrator


Provider Adapter


Ledger Service


Webhook Handler


👉 面试回答

我会将 payment orchestration、provider integration、 ledger recording 和 webhook processing 分开。

外部 providers 是不可靠且异步的, 所以系统必须处理 retries、duplicate callbacks、 delayed events 和 reconciliation。


7️⃣ Payment State Machine


常见 States

CREATED
AUTHORIZING
AUTHORIZED
AUTHORIZATION_FAILED
CAPTURING
CAPTURED
CAPTURE_FAILED
CANCELLED
REFUNDING
REFUNDED
CHARGEBACK

Example Flow

CREATED
→ AUTHORIZING
→ AUTHORIZED
→ CAPTURING
→ CAPTURED

Failure Flow

CREATED
→ AUTHORIZING
→ AUTHORIZATION_FAILED

Refund Flow

CAPTURED
→ REFUNDING
→ REFUNDED

👉 面试回答

我会将 payment 建模成 state machine。

这样可以防止非法状态转换, 让 retries 更安全, 也方便我们推理 partial failures。

例如,capture 只能在 authorization 成功后发生。


8️⃣ Authorization and Capture Flow


Flow

Order Service creates order
→ Payment Intent created
→ Fraud check
→ Provider authorization request
→ Provider returns authorized / failed
→ Payment state updated
→ Order confirmed
→ Capture happens later or immediately

Auth + Capture Timing

Immediate Capture

适用于:


Delayed Capture

适用于:


👉 面试回答

根据业务不同, authorization 和 capture 可以一起发生, 也可以分开执行。

对于简单购买, immediate capture 是可以的。

对于 ride sharing 或 food delivery 这类场景, 我会先 authorize, 等 final amount 确定后再 capture。


9️⃣ Idempotency


为什么需要?

Client 可能因为以下原因重试:

如果没有幂等:

same request retried → duplicate charge

Idempotency Key

示例:

capture-order-o123-v1

存储:

idempotency_key → request hash → response

Behavior

如果同一个 key 再次使用:


👉 面试回答

Idempotency 对 payment system 是必须的。

每个 mutation operation, 例如 authorize、capture、refund 和 cancel, 都应该要求 idempotency key。

系统会存储 key、request hash 和 response, 这样重试不会产生重复扣款。


🔟 Ledger Design


为什么需要 Ledger?

只看 payment status 不够。

我们需要 accounting records。


Double-entry Ledger

每次 money movement 至少产生两条 entries:

Debit one account
Credit another account

Example: Capture $50

Debit: Customer receivable account $50
Credit: Merchant payable account $50

Ledger Properties


👉 面试回答

我会使用 append-only double-entry ledger 来记录资金流动。

Payment state 告诉我们 workflow 进行到哪里, 但 ledger entries 才是 accounting truth。

Ledger entries 应该是 immutable 和 auditable 的。


1️⃣1️⃣ External Provider Integration


Provider Adapter Responsibilities


Provider Failures


👉 面试回答

外部 payment providers 是异步且不完全可靠的。

我会将 provider-specific logic 隔离在 provider adapters 中。

Adapter 负责统一 responses、映射 errors、 处理 retries, 并存储 provider transaction IDs 用于 reconciliation。


1️⃣2️⃣ Webhooks


为什么需要 Webhooks?

Payment providers 经常异步发送最终状态。

例如:


Webhook Flow

Provider sends webhook
→ Verify signature
→ Store raw event
→ Deduplicate event
→ Process event
→ Update payment state
→ Write ledger if needed

Important Rules


👉 面试回答

Webhooks 必须谨慎处理。

我会验证 provider signatures, 持久化保存 raw events, 根据 provider event ID 去重, 并幂等处理 events。

因为 webhooks 可能 late、duplicate 或 out of order, 所以 state transitions 必须被严格校验。


1️⃣3️⃣ Reconciliation


为什么需要?

内部状态和 provider 状态可能不一致。

例如:


Reconciliation Flow

Scheduled job
→ Query provider transactions
→ Compare with internal records
→ Find mismatches
→ Correct state
→ Generate audit report

Reconciliation Types


👉 面试回答

Reconciliation 对 payment system 非常关键。

即使有 retries 和 webhooks, 内部状态仍然可能和 provider 状态不一致。

定期 reconciliation 会比较内部记录和 provider reports, 并修复 mismatch。


1️⃣4️⃣ Refunds and Chargebacks


Refund Flow

User requests refund
→ Validate refund eligibility
→ Create refund transaction
→ Call provider refund API
→ Provider confirms or sends webhook
→ Update payment state
→ Write ledger entries

Partial Refund

支持:

refund amount < captured amount

Chargeback

Chargeback 通常是外部发起。

Bank / cardholder disputes transaction
→ Provider sends chargeback webhook
→ System marks payment disputed
→ Notify merchant/support

👉 面试回答

Refund 应该建模成独立 transaction, 而不是简单修改 payment status。

系统需要支持 partial refunds。

Chargeback 通常由外部银行或持卡人发起, 所以系统必须处理 provider webhook events, 并更新内部状态。


1️⃣5️⃣ Fraud and Risk


Fraud Signals


Risk Flow

Payment request
→ Risk scoring
→ Allow / challenge / block
→ Payment authorization

Actions


👉 面试回答

Fraud detection 应该在 payment authorization 前执行。

Risk system 可以基于 device、IP、user history、 amount、velocity 和行为信号给 transaction 打分。

高风险支付可以被 block、challenge 或进入 manual review。


1️⃣6️⃣ Security and Compliance


PCI Compliance

除非必要,不要存储原始卡号。

优先使用:

tokenized payment method

Security Requirements


Sensitive Data

避免存储:


👉 面试回答

Payment system 需要强安全和合规。

我会避免存储原始卡数据, 而是使用 payment provider 提供的 tokenized payment methods。

敏感数据必须加密, 访问必须严格控制, 所有 payment operations 都应该被审计。


1️⃣7️⃣ Scaling Patterns


Pattern 1: Separate Payment Orchestration and Ledger


Pattern 2: Provider Adapter Layer

支持多个 providers。


Pattern 3: Event-driven Processing

payment state change
→ event bus
→ order update / notification / analytics

Pattern 4: Idempotency Store

用于 mutation request 去重。


Pattern 5: Reconciliation Jobs

Scheduled jobs 比较内部和外部状态。


👉 面试回答

为了扩展并保持 payment correctness, 我会将 orchestration、provider integration、 ledger 和 reconciliation 分开。

Event-driven processing 可以通知其他系统, 而 idempotency 和 reconciliation 保证正确性。


1️⃣8️⃣ Failure Handling


Common Failures


Strategies


👉 面试回答

Payment system 必须假设 partial failures 会发生。

Provider call 可能 timeout, 但实际上 payment 已经成功。

因此每个操作都必须幂等, 状态转换必须明确, webhooks 必须安全处理, 并通过 reconciliation jobs 修复 mismatch。


1️⃣9️⃣ Consistency Model


需要较强一致性的场景


可以最终一致的场景


👉 面试回答

Payment state、ledger writes 和 idempotency records 需要强正确性。

但是 receipts、notifications、analytics 和 dashboards 这些下游更新可以最终一致。


2️⃣0️⃣ Observability


Key Metrics


Logs and Tracing

追踪:

payment_intent_id
order_id
transaction_id
provider_transaction_id
idempotency_key
state_transition

👉 面试回答

Payment system 的可观测性非常关键。

我会监控 authorization success、capture success、 refund success、provider errors、webhook lag、 reconciliation mismatches、duplicate requests、 fraud declines 和 chargebacks。

每一笔 transaction 都应该可以端到端追踪。


2️⃣1️⃣ End-to-End Flow


Payment Flow

User checks out
→ Create payment intent
→ Risk check
→ Authorize payment
→ Update payment state
→ Confirm order
→ Capture payment
→ Write ledger entries
→ Send receipt

Webhook Flow

Provider sends event
→ Verify signature
→ Store raw event
→ Deduplicate
→ Process event
→ Update transaction state
→ Write ledger if needed

Reconciliation Flow

Scheduled job runs
→ Fetch provider reports
→ Compare with internal transactions
→ Detect mismatches
→ Repair state
→ Generate audit report

Key Insight

Payment System 不是简单调用支付 API, 而是 correctness-first 的 financial workflow system。


🧠 Staff-Level Answer(最终版)


👉 面试回答(完整背诵版)

在设计 Payment System 时, 我会把它看作一个 correctness-first 的金融 workflow system。

系统应该将 payment 建模成 state machine, 包括 created、authorizing、authorized、 capturing、captured、refunding、refunded、 failed 和 chargeback 等状态。

我会从 payment intent 开始, 它表示用户为某个 order 付款的意图。

根据业务不同, 系统可以 authorize 和 capture 一起做, 也可以先 authorize,再稍后 capture。

每个 mutation API, 例如 authorize、capture、refund 和 cancel, 都必须幂等, 防止重复扣款或重复退款。

我会将 payment intent、payment transactions、 provider events 和 ledger entries 分开。

Payment state 追踪 workflow 进度, append-only double-entry ledger 则提供 accounting truth。

外部 payment providers 应该通过 provider adapters 隔离, 因为每个 provider 的 API、错误、retry behavior 和 webhook 格式都不同。

Webhooks 必须被验证、持久化、去重并幂等处理, 因为它们可能 late、duplicate 或 out of order。

Reconciliation jobs 是必须的。 它们会比较内部记录和 provider reports, 并修复 timeout、lost webhook 或 partial failure 导致的不一致。

Security 和 compliance 非常关键。 我会避免存储 raw card data, 使用 tokenized payment methods, 加密敏感数据, 强制访问控制, 并审计所有 payment operations。

核心权衡包括 correctness、availability、latency、 provider dependency、operational complexity 和 user experience。

最终目标是正确移动资金, 防止重复扣款, 保持可审计 records, 并能从 partial failures 中安全恢复。


⭐ Final Insight

Payment System 的核心不是调用支付接口, 而是一个以正确性、幂等性、账本和对账为核心的金融状态机系统。

Implement