🎯 Design Payment System
1️⃣ Core Framework
When discussing Payment System design, I frame it as:
- Payment intent and checkout flow
- Authorization, capture, refund, and chargeback
- Payment state machine
- Idempotency and exactly-once effect
- Ledger and accounting correctness
- External payment provider integration
- Webhooks, reconciliation, and retries
- Security, fraud, compliance, and observability
2️⃣ Core Requirements
Functional Requirements
- User can pay for an order
-
Support multiple payment methods:
- Credit card
- Debit card
- Wallet
- Bank transfer
- Gift card / credits
- Support authorization and capture
- Support refund
- Support cancellation
- Support payment status tracking
- Support receipts
- Support provider webhooks
- Support reconciliation
Non-functional Requirements
- Strong correctness
- High availability
- Idempotent APIs
- Secure payment data handling
- Accurate ledger records
- Auditable transaction history
- Reliable retry and recovery
- Compliance with payment standards
👉 Interview Answer
A payment system must move money correctly.
Unlike many user-facing systems, payment design prioritizes correctness, idempotency, auditability, and security over raw latency.
The main challenge is preventing duplicate charges, handling provider failures, maintaining accurate ledger records, and reconciling internal state with external payment providers.
3️⃣ Core Concepts
Payment Intent
A payment intent represents the user’s intent to pay.
Example:
order_id = o123
amount = $50
currency = USD
status = CREATED
Authorization
Authorization reserves money on the user’s payment method.
Can this user pay $50?
Capture
Capture actually collects the money.
Charge the authorized amount.
Refund
Refund sends money back to the user.
Chargeback
A chargeback is a dispute initiated by the cardholder or bank.
👉 Interview Answer
I would model payment as a stateful process.
Payment intent represents the user’s intent to pay. Authorization reserves funds, capture collects the funds, refund reverses money back to the user, and chargeback handles disputes.
4️⃣ Main APIs
Create Payment Intent
POST /api/payment-intents
Request:
{
"orderId": "o123",
"userId": "u456",
"amount": 5000,
"currency": "USD",
"paymentMethodId": "pm789"
}
Response:
{
"paymentIntentId": "pi_123",
"status": "CREATED"
}
Authorize Payment
POST /api/payment-intents/{paymentIntentId}/authorize
Headers:
Idempotency-Key: auth-o123-v1
Capture Payment
POST /api/payment-intents/{paymentIntentId}/capture
Headers:
Idempotency-Key: capture-o123-v1
Refund Payment
POST /api/payments/{paymentId}/refund
Request:
{
"amount": 5000,
"reason": "customer_cancelled"
}
Get Payment Status
GET /api/payment-intents/{paymentIntentId}
👉 Interview Answer
I would expose APIs to create a payment intent, authorize payment, capture payment, refund payment, and retrieve payment status.
All mutation APIs must support idempotency keys, because clients and services may retry requests.
5️⃣ Data Model
Payment Intent Table
payment_intent (
payment_intent_id VARCHAR PRIMARY KEY,
order_id VARCHAR UNIQUE,
user_id VARCHAR,
amount BIGINT,
currency VARCHAR,
payment_method_id VARCHAR,
status VARCHAR,
idempotency_key VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Payment Transaction Table
payment_transaction (
transaction_id VARCHAR PRIMARY KEY,
payment_intent_id VARCHAR,
transaction_type VARCHAR, -- authorize, capture, refund, void
amount BIGINT,
currency VARCHAR,
status VARCHAR,
provider VARCHAR,
provider_transaction_id VARCHAR,
idempotency_key VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Ledger Entry Table
ledger_entry (
ledger_entry_id VARCHAR PRIMARY KEY,
transaction_id VARCHAR,
account_id VARCHAR,
entry_type VARCHAR, -- debit or credit
amount BIGINT,
currency VARCHAR,
created_at TIMESTAMP
)
Provider Event Table
provider_event (
provider_event_id VARCHAR PRIMARY KEY,
provider VARCHAR,
event_type VARCHAR,
provider_transaction_id VARCHAR,
payload JSON,
processed BOOLEAN,
received_at TIMESTAMP
)
👉 Interview Answer
I would separate payment intent, payment transactions, provider events, and ledger entries.
Payment intent tracks the high-level payment lifecycle.
Transactions track individual operations like authorization, capture, refund, and void.
Ledger entries provide auditable accounting records.
6️⃣ High-Level Architecture
Client / Order Service
→ Payment API
→ Payment Orchestrator
→ Risk / Fraud Service
→ Payment Provider Adapter
→ External Payment Provider
Provider Webhook
→ Webhook Handler
→ Payment State Updater
→ Ledger Service
→ Reconciliation Jobs
Main Components
Payment API
- Accepts payment requests
- Validates request
- Enforces idempotency
Payment Orchestrator
- Manages payment workflow
- Drives state transitions
- Calls provider adapters
Provider Adapter
- Integrates with Stripe / Adyen / Braintree / bank
- Normalizes provider-specific responses
Ledger Service
- Writes immutable debit/credit entries
- Supports accounting and audit
Webhook Handler
- Processes asynchronous provider events
👉 Interview Answer
I would separate payment orchestration, provider integration, ledger recording, and webhook processing.
External providers are unreliable and asynchronous, so the system must handle retries, duplicate callbacks, delayed events, and reconciliation.
7️⃣ Payment State Machine
Common States
CREATED
AUTHORIZING
AUTHORIZED
AUTHORIZATION_FAILED
CAPTURING
CAPTURED
CAPTURE_FAILED
CANCELLED
REFUNDING
REFUNDED
CHARGEBACK
Example Flow
CREATED
→ AUTHORIZING
→ AUTHORIZED
→ CAPTURING
→ CAPTURED
Failure Flow
CREATED
→ AUTHORIZING
→ AUTHORIZATION_FAILED
Refund Flow
CAPTURED
→ REFUNDING
→ REFUNDED
👉 Interview Answer
I would model payment as a state machine.
This prevents invalid transitions, makes retries safer, and helps us reason about partial failures.
For example, capture should only happen after authorization succeeds.
8️⃣ Authorization and Capture Flow
Flow
Order Service creates order
→ Payment Intent created
→ Fraud check
→ Provider authorization request
→ Provider returns authorized / failed
→ Payment state updated
→ Order confirmed
→ Capture happens later or immediately
Auth + Capture Timing
Immediate Capture
Used for:
- Digital goods
- Instant checkout
- Simple e-commerce
Delayed Capture
Used for:
- Ride sharing
- Food delivery
- Hotel booking
- Marketplace orders
👉 Interview Answer
Depending on the business, authorization and capture may happen together or separately.
For simple purchases, immediate capture is fine.
For services like ride sharing or food delivery, I would authorize first and capture later after the final amount is known.
9️⃣ Idempotency
Why Needed?
Clients may retry due to:
- Timeout
- Network failure
- Server error
- Provider delay
Without idempotency:
same request retried → duplicate charge
Idempotency Key
Example:
capture-order-o123-v1
Store:
idempotency_key → request hash → response
Behavior
If same key is used again:
- If request body is same: return previous response
- If request body is different: reject request
👉 Interview Answer
Idempotency is essential in payment systems.
Every mutation operation, such as authorize, capture, refund, and cancel, should require an idempotency key.
The system stores the key, request hash, and response, so retries do not create duplicate charges.
🔟 Ledger Design
Why Ledger?
Payment status alone is not enough.
We need accounting records.
Double-entry Ledger
Every money movement creates at least two entries:
Debit one account
Credit another account
Example: Capture $50
Debit: Customer receivable account $50
Credit: Merchant payable account $50
Ledger Properties
- Immutable
- Append-only
- Auditable
- Balanced
- Reconciliable
👉 Interview Answer
I would use an append-only double-entry ledger to track money movement.
Payment state tells us where the workflow is, but ledger entries provide the accounting truth.
Ledger entries should be immutable and auditable.
1️⃣1️⃣ External Provider Integration
Provider Adapter Responsibilities
- Convert internal request to provider format
- Handle provider response
- Normalize errors
- Retry transient failures
- Map provider status to internal status
- Store provider transaction ID
Provider Failures
- Timeout
- Rate limit
- Duplicate request
- Provider unavailable
- Delayed response
- Webhook arrives before API response
👉 Interview Answer
External payment providers are asynchronous and unreliable.
I would isolate provider-specific logic inside provider adapters.
The adapter normalizes responses, maps errors, handles retries, and stores provider transaction IDs for reconciliation.
1️⃣2️⃣ Webhooks
Why Webhooks?
Payment providers often send final status asynchronously.
Examples:
- Payment succeeded
- Payment failed
- Refund succeeded
- Chargeback created
Webhook Flow
Provider sends webhook
→ Verify signature
→ Store raw event
→ Deduplicate event
→ Process event
→ Update payment state
→ Write ledger if needed
Important Rules
- Always verify provider signature
- Store raw webhook event
- Process idempotently
- Handle out-of-order events
- Return success only after durable storage
👉 Interview Answer
Webhooks must be processed carefully.
I would verify provider signatures, store raw events durably, deduplicate provider event IDs, and process events idempotently.
Since webhooks can arrive late or out of order, state transitions must be validated.
1️⃣3️⃣ Reconciliation
Why Needed?
Internal state and provider state may diverge.
Examples:
- API timeout but provider charged user
- Webhook lost
- Internal update failed
- Refund status delayed
Reconciliation Flow
Scheduled job
→ Query provider transactions
→ Compare with internal records
→ Find mismatches
→ Correct state
→ Generate audit report
Reconciliation Types
- Payment reconciliation
- Refund reconciliation
- Settlement reconciliation
- Ledger reconciliation
👉 Interview Answer
Reconciliation is critical in payment systems.
Even with retries and webhooks, internal state can diverge from provider state.
Periodic reconciliation compares internal records with provider reports and corrects mismatches.
1️⃣4️⃣ Refunds and Chargebacks
Refund Flow
User requests refund
→ Validate refund eligibility
→ Create refund transaction
→ Call provider refund API
→ Provider confirms or sends webhook
→ Update payment state
→ Write ledger entries
Partial Refund
Support:
refund amount < captured amount
Chargeback
Chargeback is initiated externally.
Bank / cardholder disputes transaction
→ Provider sends chargeback webhook
→ System marks payment disputed
→ Notify merchant/support
👉 Interview Answer
Refunds should be modeled as separate transactions, not just status changes.
Partial refunds must be supported.
Chargebacks are usually initiated externally, so the system must handle provider webhook events and update internal state accordingly.
1️⃣5️⃣ Fraud and Risk
Fraud Signals
- Unusual purchase amount
- New device
- Suspicious IP
- High refund rate
- Velocity checks
- Stolen card indicators
- Mismatch between billing and shipping location
Risk Flow
Payment request
→ Risk scoring
→ Allow / challenge / block
→ Payment authorization
Actions
- Allow
- Require 3DS / step-up verification
- Manual review
- Block
👉 Interview Answer
Fraud detection should run before payment authorization.
The risk system can score the transaction based on device, IP, user history, amount, velocity, and behavioral signals.
High-risk payments can be blocked, challenged, or sent to manual review.
1️⃣6️⃣ Security and Compliance
PCI Compliance
Do not store raw card numbers unless necessary.
Prefer:
tokenized payment method
Security Requirements
- Encrypt data in transit
- Encrypt sensitive data at rest
- Tokenize payment methods
- Strict access control
- Audit logs
- Secrets management
- Webhook signature verification
Sensitive Data
Avoid storing:
- Full card number
- CVV
- Raw bank credentials
👉 Interview Answer
Payment systems require strong security and compliance.
I would avoid storing raw card data and use tokenized payment methods from payment providers.
Sensitive data must be encrypted, access must be tightly controlled, and all payment operations should be audited.
1️⃣7️⃣ Scaling Patterns
Pattern 1: Separate Payment Orchestration and Ledger
- Orchestrator manages workflow
- Ledger records money movement
Pattern 2: Provider Adapter Layer
Supports multiple providers.
Pattern 3: Event-driven Processing
payment state change
→ event bus
→ order update / notification / analytics
Pattern 4: Idempotency Store
Central store for mutation request deduplication.
Pattern 5: Reconciliation Jobs
Scheduled jobs compare internal and external states.
👉 Interview Answer
To scale and maintain payment correctness, I would separate orchestration, provider integration, ledger, and reconciliation.
Event-driven processing helps notify other systems, while idempotency and reconciliation ensure correctness.
1️⃣8️⃣ Failure Handling
Common Failures
- Client timeout
- Provider timeout
- Provider returns unknown status
- Webhook delayed
- Webhook duplicated
- Internal DB update fails
- Capture succeeds but order update fails
- Refund provider call fails
Strategies
- Idempotency keys
- Retry with backoff
- Store provider transaction IDs
- Process webhooks idempotently
- Use payment state machine
- Reconcile periodically
- Use outbox pattern for events
- Manual review for unresolved states
👉 Interview Answer
Payment systems must assume partial failures.
A provider call may time out even though the payment succeeded.
Therefore, every operation should be idempotent, state transitions should be explicit, webhooks should be processed safely, and reconciliation jobs should correct mismatches.
1️⃣9️⃣ Consistency Model
Stronger Consistency Needed For
- Payment state transitions
- Ledger writes
- Idempotency records
- Capture and refund operations
- Billing and settlement
Eventual Consistency Acceptable For
- Receipts
- Notifications
- Analytics
- Reporting dashboards
- Order status propagation
👉 Interview Answer
Payment state, ledger writes, and idempotency records require strong correctness.
But downstream updates such as receipts, notifications, analytics, and dashboards can be eventually consistent.
2️⃣0️⃣ Observability
Key Metrics
- Authorization success rate
- Capture success rate
- Refund success rate
- Provider timeout rate
- Payment latency
- Webhook processing lag
- Reconciliation mismatch count
- Duplicate request count
- Fraud decline rate
- Chargeback rate
Logs and Tracing
Track:
payment_intent_id
order_id
transaction_id
provider_transaction_id
idempotency_key
state_transition
👉 Interview Answer
Observability is critical for payments.
I would monitor authorization success, capture success, refund success, provider errors, webhook lag, reconciliation mismatches, duplicate requests, fraud declines, and chargebacks.
Every transaction should be traceable end to end.
2️⃣1️⃣ End-to-End Flow
Payment Flow
User checks out
→ Create payment intent
→ Risk check
→ Authorize payment
→ Update payment state
→ Confirm order
→ Capture payment
→ Write ledger entries
→ Send receipt
Webhook Flow
Provider sends event
→ Verify signature
→ Store raw event
→ Deduplicate
→ Process event
→ Update transaction state
→ Write ledger if needed
Reconciliation Flow
Scheduled job runs
→ Fetch provider reports
→ Compare with internal transactions
→ Detect mismatches
→ Repair state
→ Generate audit report
Key Insight
Payment System is not just calling a payment API — it is a correctness-first financial workflow system.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing a payment system, I think of it as a correctness-first financial workflow system.
The system should model payment as a state machine with states such as created, authorizing, authorized, capturing, captured, refunding, refunded, failed, and chargeback.
I would start with a payment intent, which represents the user’s intent to pay for an order.
Depending on the business, the system may authorize and capture immediately, or authorize first and capture later.
Every mutation API, such as authorize, capture, refund, and cancel, must be idempotent to prevent duplicate charges or refunds.
I would separate payment intent, payment transactions, provider events, and ledger entries.
Payment state tracks workflow progress, while the append-only double-entry ledger provides the accounting truth.
External payment providers should be isolated behind provider adapters, because each provider has different APIs, errors, retry behavior, and webhook formats.
Webhooks must be verified, stored durably, deduplicated, and processed idempotently, because they may arrive late, duplicated, or out of order.
Reconciliation jobs are essential. They compare internal records with provider reports and repair mismatches caused by timeouts, lost webhooks, or partial failures.
Security and compliance are critical. I would avoid storing raw card data, use tokenized payment methods, encrypt sensitive data, enforce strict access control, and audit all payment operations.
The main trade-offs are correctness, availability, latency, provider dependency, operational complexity, and user experience.
Ultimately, the goal is to move money correctly, prevent duplicate charges, maintain auditable records, and recover safely from partial failures.
⭐ Final Insight
Payment System 的核心不是调用支付接口, 而是一个以正确性、幂等性、账本和对账为核心的金融状态机系统。
中文部分
🎯 Design Payment System
1️⃣ 核心框架
在设计 Payment System 时,我通常从以下几个方面来分析:
- Payment intent 和 checkout flow
- Authorization、capture、refund 和 chargeback
- Payment state machine
- Idempotency 和 exactly-once effect
- Ledger 和 accounting correctness
- 外部 payment provider integration
- Webhooks、reconciliation 和 retries
- Security、fraud、compliance 和 observability
2️⃣ 核心需求
功能需求
- 用户可以为订单付款
-
支持多种支付方式:
- Credit card
- Debit card
- Wallet
- Bank transfer
- Gift card / credits
- 支持 authorization 和 capture
- 支持 refund
- 支持 cancellation
- 支持 payment status tracking
- 支持 receipts
- 支持 provider webhooks
- 支持 reconciliation
非功能需求
- 强正确性
- 高可用
- 幂等 API
- 安全处理支付数据
- 准确 ledger records
- 可审计 transaction history
- 可靠 retry 和 recovery
- 满足支付合规要求
👉 面试回答
Payment System 的核心是正确地移动资金。
和很多用户系统不同, payment design 更优先考虑 correctness、idempotency、 auditability 和 security, 而不是单纯追求低延迟。
核心挑战是防止重复扣款、 处理 provider failure、 维护准确 ledger records, 并让内部状态和外部 payment provider 对账一致。
3️⃣ 核心概念
Payment Intent
Payment intent 表示用户的付款意图。
示例:
order_id = o123
amount = $50
currency = USD
status = CREATED
Authorization
Authorization 会在用户支付方式上预留金额。
Can this user pay $50?
Capture
Capture 才是真正收款。
Charge the authorized amount.
Refund
Refund 将钱退回给用户。
Chargeback
Chargeback 是持卡人或银行发起的争议。
👉 面试回答
我会将 payment 建模成一个有状态流程。
Payment intent 表示用户付款意图; authorization 预留资金; capture 真正收款; refund 将钱退回用户; chargeback 则处理外部争议。
4️⃣ 主要 API
Create Payment Intent
POST /api/payment-intents
Request:
{
"orderId": "o123",
"userId": "u456",
"amount": 5000,
"currency": "USD",
"paymentMethodId": "pm789"
}
Response:
{
"paymentIntentId": "pi_123",
"status": "CREATED"
}
Authorize Payment
POST /api/payment-intents/{paymentIntentId}/authorize
Headers:
Idempotency-Key: auth-o123-v1
Capture Payment
POST /api/payment-intents/{paymentIntentId}/capture
Headers:
Idempotency-Key: capture-o123-v1
Refund Payment
POST /api/payments/{paymentId}/refund
Request:
{
"amount": 5000,
"reason": "customer_cancelled"
}
Get Payment Status
GET /api/payment-intents/{paymentIntentId}
👉 面试回答
我会提供 create payment intent、authorize payment、 capture payment、refund payment 和查询 payment status 的 API。
所有 mutation APIs 都必须支持 idempotency keys, 因为 client 和 service 都可能因为 timeout 或错误而重试。
5️⃣ 数据模型
Payment Intent Table
payment_intent (
payment_intent_id VARCHAR PRIMARY KEY,
order_id VARCHAR UNIQUE,
user_id VARCHAR,
amount BIGINT,
currency VARCHAR,
payment_method_id VARCHAR,
status VARCHAR,
idempotency_key VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Payment Transaction Table
payment_transaction (
transaction_id VARCHAR PRIMARY KEY,
payment_intent_id VARCHAR,
transaction_type VARCHAR, -- authorize, capture, refund, void
amount BIGINT,
currency VARCHAR,
status VARCHAR,
provider VARCHAR,
provider_transaction_id VARCHAR,
idempotency_key VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Ledger Entry Table
ledger_entry (
ledger_entry_id VARCHAR PRIMARY KEY,
transaction_id VARCHAR,
account_id VARCHAR,
entry_type VARCHAR, -- debit or credit
amount BIGINT,
currency VARCHAR,
created_at TIMESTAMP
)
Provider Event Table
provider_event (
provider_event_id VARCHAR PRIMARY KEY,
provider VARCHAR,
event_type VARCHAR,
provider_transaction_id VARCHAR,
payload JSON,
processed BOOLEAN,
received_at TIMESTAMP
)
👉 面试回答
我会将 payment intent、payment transactions、 provider events 和 ledger entries 分开存储。
Payment intent 追踪高层支付生命周期。
Transactions 记录 authorize、capture、refund 和 void 等具体操作。
Ledger entries 提供可审计的 accounting records。
6️⃣ High-Level Architecture
Client / Order Service
→ Payment API
→ Payment Orchestrator
→ Risk / Fraud Service
→ Payment Provider Adapter
→ External Payment Provider
Provider Webhook
→ Webhook Handler
→ Payment State Updater
→ Ledger Service
→ Reconciliation Jobs
Main Components
Payment API
- 接收支付请求
- 校验请求
- 强制 idempotency
Payment Orchestrator
- 管理支付流程
- 推动状态转换
- 调用 provider adapters
Provider Adapter
- 集成 Stripe / Adyen / Braintree / bank
- 统一 provider-specific responses
Ledger Service
- 写入 immutable debit / credit entries
- 支持 accounting 和 audit
Webhook Handler
- 处理 provider 异步事件
👉 面试回答
我会将 payment orchestration、provider integration、 ledger recording 和 webhook processing 分开。
外部 providers 是不可靠且异步的, 所以系统必须处理 retries、duplicate callbacks、 delayed events 和 reconciliation。
7️⃣ Payment State Machine
常见 States
CREATED
AUTHORIZING
AUTHORIZED
AUTHORIZATION_FAILED
CAPTURING
CAPTURED
CAPTURE_FAILED
CANCELLED
REFUNDING
REFUNDED
CHARGEBACK
Example Flow
CREATED
→ AUTHORIZING
→ AUTHORIZED
→ CAPTURING
→ CAPTURED
Failure Flow
CREATED
→ AUTHORIZING
→ AUTHORIZATION_FAILED
Refund Flow
CAPTURED
→ REFUNDING
→ REFUNDED
👉 面试回答
我会将 payment 建模成 state machine。
这样可以防止非法状态转换, 让 retries 更安全, 也方便我们推理 partial failures。
例如,capture 只能在 authorization 成功后发生。
8️⃣ Authorization and Capture Flow
Flow
Order Service creates order
→ Payment Intent created
→ Fraud check
→ Provider authorization request
→ Provider returns authorized / failed
→ Payment state updated
→ Order confirmed
→ Capture happens later or immediately
Auth + Capture Timing
Immediate Capture
适用于:
- Digital goods
- Instant checkout
- Simple e-commerce
Delayed Capture
适用于:
- Ride sharing
- Food delivery
- Hotel booking
- Marketplace orders
👉 面试回答
根据业务不同, authorization 和 capture 可以一起发生, 也可以分开执行。
对于简单购买, immediate capture 是可以的。
对于 ride sharing 或 food delivery 这类场景, 我会先 authorize, 等 final amount 确定后再 capture。
9️⃣ Idempotency
为什么需要?
Client 可能因为以下原因重试:
- Timeout
- Network failure
- Server error
- Provider delay
如果没有幂等:
same request retried → duplicate charge
Idempotency Key
示例:
capture-order-o123-v1
存储:
idempotency_key → request hash → response
Behavior
如果同一个 key 再次使用:
- 如果 request body 相同:返回之前 response
- 如果 request body 不同:拒绝请求
👉 面试回答
Idempotency 对 payment system 是必须的。
每个 mutation operation, 例如 authorize、capture、refund 和 cancel, 都应该要求 idempotency key。
系统会存储 key、request hash 和 response, 这样重试不会产生重复扣款。
🔟 Ledger Design
为什么需要 Ledger?
只看 payment status 不够。
我们需要 accounting records。
Double-entry Ledger
每次 money movement 至少产生两条 entries:
Debit one account
Credit another account
Example: Capture $50
Debit: Customer receivable account $50
Credit: Merchant payable account $50
Ledger Properties
- Immutable
- Append-only
- Auditable
- Balanced
- Reconciliable
👉 面试回答
我会使用 append-only double-entry ledger 来记录资金流动。
Payment state 告诉我们 workflow 进行到哪里, 但 ledger entries 才是 accounting truth。
Ledger entries 应该是 immutable 和 auditable 的。
1️⃣1️⃣ External Provider Integration
Provider Adapter Responsibilities
- 将内部 request 转换成 provider 格式
- 处理 provider response
- 统一错误类型
- 重试 transient failures
- 将 provider status 映射到 internal status
- 存储 provider transaction ID
Provider Failures
- Timeout
- Rate limit
- Duplicate request
- Provider unavailable
- Delayed response
- Webhook arrives before API response
👉 面试回答
外部 payment providers 是异步且不完全可靠的。
我会将 provider-specific logic 隔离在 provider adapters 中。
Adapter 负责统一 responses、映射 errors、 处理 retries, 并存储 provider transaction IDs 用于 reconciliation。
1️⃣2️⃣ Webhooks
为什么需要 Webhooks?
Payment providers 经常异步发送最终状态。
例如:
- Payment succeeded
- Payment failed
- Refund succeeded
- Chargeback created
Webhook Flow
Provider sends webhook
→ Verify signature
→ Store raw event
→ Deduplicate event
→ Process event
→ Update payment state
→ Write ledger if needed
Important Rules
- 一定要验证 provider signature
- 存储 raw webhook event
- 幂等处理
- 处理 out-of-order events
- durable storage 后再返回成功
👉 面试回答
Webhooks 必须谨慎处理。
我会验证 provider signatures, 持久化保存 raw events, 根据 provider event ID 去重, 并幂等处理 events。
因为 webhooks 可能 late、duplicate 或 out of order, 所以 state transitions 必须被严格校验。
1️⃣3️⃣ Reconciliation
为什么需要?
内部状态和 provider 状态可能不一致。
例如:
- API timeout,但 provider 实际已经扣款
- Webhook 丢失
- 内部状态更新失败
- Refund 状态延迟
Reconciliation Flow
Scheduled job
→ Query provider transactions
→ Compare with internal records
→ Find mismatches
→ Correct state
→ Generate audit report
Reconciliation Types
- Payment reconciliation
- Refund reconciliation
- Settlement reconciliation
- Ledger reconciliation
👉 面试回答
Reconciliation 对 payment system 非常关键。
即使有 retries 和 webhooks, 内部状态仍然可能和 provider 状态不一致。
定期 reconciliation 会比较内部记录和 provider reports, 并修复 mismatch。
1️⃣4️⃣ Refunds and Chargebacks
Refund Flow
User requests refund
→ Validate refund eligibility
→ Create refund transaction
→ Call provider refund API
→ Provider confirms or sends webhook
→ Update payment state
→ Write ledger entries
Partial Refund
支持:
refund amount < captured amount
Chargeback
Chargeback 通常是外部发起。
Bank / cardholder disputes transaction
→ Provider sends chargeback webhook
→ System marks payment disputed
→ Notify merchant/support
👉 面试回答
Refund 应该建模成独立 transaction, 而不是简单修改 payment status。
系统需要支持 partial refunds。
Chargeback 通常由外部银行或持卡人发起, 所以系统必须处理 provider webhook events, 并更新内部状态。
1️⃣5️⃣ Fraud and Risk
Fraud Signals
- Unusual purchase amount
- New device
- Suspicious IP
- High refund rate
- Velocity checks
- Stolen card indicators
- Billing 和 shipping location 不匹配
Risk Flow
Payment request
→ Risk scoring
→ Allow / challenge / block
→ Payment authorization
Actions
- Allow
- Require 3DS / step-up verification
- Manual review
- Block
👉 面试回答
Fraud detection 应该在 payment authorization 前执行。
Risk system 可以基于 device、IP、user history、 amount、velocity 和行为信号给 transaction 打分。
高风险支付可以被 block、challenge 或进入 manual review。
1️⃣6️⃣ Security and Compliance
PCI Compliance
除非必要,不要存储原始卡号。
优先使用:
tokenized payment method
Security Requirements
- Encrypt data in transit
- Encrypt sensitive data at rest
- Tokenize payment methods
- Strict access control
- Audit logs
- Secrets management
- Webhook signature verification
Sensitive Data
避免存储:
- Full card number
- CVV
- Raw bank credentials
👉 面试回答
Payment system 需要强安全和合规。
我会避免存储原始卡数据, 而是使用 payment provider 提供的 tokenized payment methods。
敏感数据必须加密, 访问必须严格控制, 所有 payment operations 都应该被审计。
1️⃣7️⃣ Scaling Patterns
Pattern 1: Separate Payment Orchestration and Ledger
- Orchestrator 管理 workflow
- Ledger 记录 money movement
Pattern 2: Provider Adapter Layer
支持多个 providers。
Pattern 3: Event-driven Processing
payment state change
→ event bus
→ order update / notification / analytics
Pattern 4: Idempotency Store
用于 mutation request 去重。
Pattern 5: Reconciliation Jobs
Scheduled jobs 比较内部和外部状态。
👉 面试回答
为了扩展并保持 payment correctness, 我会将 orchestration、provider integration、 ledger 和 reconciliation 分开。
Event-driven processing 可以通知其他系统, 而 idempotency 和 reconciliation 保证正确性。
1️⃣8️⃣ Failure Handling
Common Failures
- Client timeout
- Provider timeout
- Provider returns unknown status
- Webhook delayed
- Webhook duplicated
- Internal DB update fails
- Capture succeeds but order update fails
- Refund provider call fails
Strategies
- Idempotency keys
- Retry with backoff
- Store provider transaction IDs
- Process webhooks idempotently
- Use payment state machine
- Reconcile periodically
- Use outbox pattern for events
- Manual review for unresolved states
👉 面试回答
Payment system 必须假设 partial failures 会发生。
Provider call 可能 timeout, 但实际上 payment 已经成功。
因此每个操作都必须幂等, 状态转换必须明确, webhooks 必须安全处理, 并通过 reconciliation jobs 修复 mismatch。
1️⃣9️⃣ Consistency Model
需要较强一致性的场景
- Payment state transitions
- Ledger writes
- Idempotency records
- Capture and refund operations
- Billing and settlement
可以最终一致的场景
- Receipts
- Notifications
- Analytics
- Reporting dashboards
- Order status propagation
👉 面试回答
Payment state、ledger writes 和 idempotency records 需要强正确性。
但是 receipts、notifications、analytics 和 dashboards 这些下游更新可以最终一致。
2️⃣0️⃣ Observability
Key Metrics
- Authorization success rate
- Capture success rate
- Refund success rate
- Provider timeout rate
- Payment latency
- Webhook processing lag
- Reconciliation mismatch count
- Duplicate request count
- Fraud decline rate
- Chargeback rate
Logs and Tracing
追踪:
payment_intent_id
order_id
transaction_id
provider_transaction_id
idempotency_key
state_transition
👉 面试回答
Payment system 的可观测性非常关键。
我会监控 authorization success、capture success、 refund success、provider errors、webhook lag、 reconciliation mismatches、duplicate requests、 fraud declines 和 chargebacks。
每一笔 transaction 都应该可以端到端追踪。
2️⃣1️⃣ End-to-End Flow
Payment Flow
User checks out
→ Create payment intent
→ Risk check
→ Authorize payment
→ Update payment state
→ Confirm order
→ Capture payment
→ Write ledger entries
→ Send receipt
Webhook Flow
Provider sends event
→ Verify signature
→ Store raw event
→ Deduplicate
→ Process event
→ Update transaction state
→ Write ledger if needed
Reconciliation Flow
Scheduled job runs
→ Fetch provider reports
→ Compare with internal transactions
→ Detect mismatches
→ Repair state
→ Generate audit report
Key Insight
Payment System 不是简单调用支付 API, 而是 correctness-first 的 financial workflow system。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 Payment System 时, 我会把它看作一个 correctness-first 的金融 workflow system。
系统应该将 payment 建模成 state machine, 包括 created、authorizing、authorized、 capturing、captured、refunding、refunded、 failed 和 chargeback 等状态。
我会从 payment intent 开始, 它表示用户为某个 order 付款的意图。
根据业务不同, 系统可以 authorize 和 capture 一起做, 也可以先 authorize,再稍后 capture。
每个 mutation API, 例如 authorize、capture、refund 和 cancel, 都必须幂等, 防止重复扣款或重复退款。
我会将 payment intent、payment transactions、 provider events 和 ledger entries 分开。
Payment state 追踪 workflow 进度, append-only double-entry ledger 则提供 accounting truth。
外部 payment providers 应该通过 provider adapters 隔离, 因为每个 provider 的 API、错误、retry behavior 和 webhook 格式都不同。
Webhooks 必须被验证、持久化、去重并幂等处理, 因为它们可能 late、duplicate 或 out of order。
Reconciliation jobs 是必须的。 它们会比较内部记录和 provider reports, 并修复 timeout、lost webhook 或 partial failure 导致的不一致。
Security 和 compliance 非常关键。 我会避免存储 raw card data, 使用 tokenized payment methods, 加密敏感数据, 强制访问控制, 并审计所有 payment operations。
核心权衡包括 correctness、availability、latency、 provider dependency、operational complexity 和 user experience。
最终目标是正确移动资金, 防止重复扣款, 保持可审计 records, 并能从 partial failures 中安全恢复。
⭐ Final Insight
Payment System 的核心不是调用支付接口, 而是一个以正确性、幂等性、账本和对账为核心的金融状态机系统。
Implement