🎯 Core CAP Decision Framework
When discussing CAP theorem, I frame it as a decision framework under failure, not just a theory:
- What CAP actually means in real systems
- Choosing between Consistency vs Availability under partition
- Mapping CAP to user-facing requirements
- Real-world system design patterns
1️⃣ What CAP Actually Means
Definition (Reframed)
- C = Consistency (all nodes see same data)
- A = Availability (every request gets a response)
- P = Partition tolerance (network failures)
👉 Key reality: Partition is NOT optional
Core Insight
You don’t choose between C, A, and P — you choose between C and A when P happens
👉 Interview Answer
CAP theorem states that during a network partition, a system must choose between consistency and availability. In practice, partition tolerance is not optional in distributed systems, so the real decision is whether to prioritize consistency or availability under failure.
2️⃣ Consistency vs Availability (Real Trade-off)
CP Systems (Consistency-first)
Behavior:
- Reject requests if consistency cannot be guaranteed
Examples:
- Strongly consistent databases
- Leader-based systems
👉 Interview Answer
In a CP system, we prioritize consistency over availability. If the system cannot guarantee consistent data during a partition, it may reject or delay requests. This is typically used in systems like payments where correctness is critical.
AP Systems (Availability-first)
Behavior:
- Always respond, even if data is stale
Examples:
- Eventually consistent databases
- Distributed caches
👉 Interview Answer
In an AP system, we prioritize availability and continue serving requests even during partitions. This means the system may return stale or inconsistent data, but remains responsive. This is suitable for systems like social feeds where availability is more important than strict correctness.
CP vs AP Summary
| Type | Behavior | Risk |
|---|---|---|
| CP | May reject requests | Lower availability |
| AP | May return stale data | Inconsistency |
👉 Interview Answer(总结一句)
CP systems fail by becoming unavailable, while AP systems fail by returning inconsistent data.
3️⃣ Mapping CAP to User Requirements
Key Insight
CAP is not a system-level choice — it’s a per-use-case decision
Example Decisions
Case 1: Payments
- Must be CP
👉 Interview Answer
For payments, I always choose consistency over availability. It’s better to reject a request than to process an incorrect transaction.
Case 2: Social Feed
- AP is acceptable
👉 Interview Answer
For feeds or timelines, I prefer availability. Users can tolerate slightly stale data, but not downtime.
Case 3: User Profile
- Hybrid (session consistency)
👉 Interview Answer
For user profile updates, I usually ensure read-after-write consistency, often by routing reads to the leader or using session stickiness, while allowing eventual consistency for other users.
Key Takeaway
Different parts of the system can make different CAP choices.
4️⃣ Failure Scenarios (Staff-level depth)
Scenario 1: Network partition
👉 Interview Answer
During a network partition, a CP system may reject requests to maintain consistency, while an AP system continues serving requests but may diverge. This is the core situation where CAP applies.
Scenario 2: Split brain
- Two nodes accept writes independently
👉 Interview Answer
In split-brain scenarios, AP systems may allow both sides to accept writes, which requires conflict resolution later. CP systems prevent this by enforcing a single leader.
Scenario 3: Data divergence
👉 Interview Answer
In AP systems, replicas may diverge during partitions. I typically handle this using reconciliation strategies like last-write-wins, versioning, or CRDTs depending on the use case.
5️⃣ Real-world Design Patterns
Pattern 1: Hybrid CAP (Most systems)
- CP for critical paths
- AP for non-critical paths
👉 Interview Answer
In practice, most systems are hybrid. For example, we use strong consistency for critical operations, and eventual consistency for scalable, non-critical paths.
Pattern 2: Leader-based systems (CP)
- Single leader ensures consistency
👉 Interview Answer
Leader-based systems enforce consistency by routing writes through a single node. This avoids conflicts but may reduce availability during failures.
Pattern 3: Eventually consistent systems (AP)
- Accept divergence, reconcile later
👉 Interview Answer
AP systems accept temporary inconsistency and rely on reconciliation mechanisms to converge over time. This allows the system to remain highly available.
Pattern 4: Multi-region strategy
- CP within region
- AP across regions
👉 Interview Answer
In multi-region systems, we often use strong consistency within a region and eventual consistency across regions, balancing latency, availability, and correctness.
🧠 Staff-Level Answer (Final Polished)
👉 Interview Answer(完整背诵版)
CAP theorem is best understood as a decision framework under failure. Since partition tolerance is unavoidable, the real question is whether to prioritize consistency or availability during a partition.
In CP systems, we prioritize correctness and may reject requests when consistency cannot be guaranteed. In AP systems, we prioritize availability and continue serving requests, even if the data is temporarily inconsistent.
In practice, this decision is driven by user requirements. For example, payments require strong consistency, while social feeds can tolerate eventual consistency.
Most real-world systems adopt a hybrid approach, applying strong consistency to critical paths and eventual consistency to scalable, non-critical components.
⭐ Staff-Level Insight
👉 Interview Answer
CAP is not about choosing a system type — it’s about deciding how your system behaves under failure.
The best systems don’t eliminate trade-offs — they make them explicit and align them with user expectations.
中文速背版(Staff级)
CAP本质
P一定存在 → 真正选择是 C vs A
CP
保证一致性 → 可能拒绝请求
AP
保证可用性 → 可能返回旧数据
核心
CP:错不了 AP:不停机
实际系统
混合:关键路径 CP,非关键 AP
Implement