🎯 Core Replication Framework
👉 Interview Answer
When discussing replication, I usually break it down into consistency vs availability trade-offs, replication models like sync and async, failure scenarios, and finally how real systems combine these approaches in practice.
1️⃣ Sync vs Async Replication
Synchronous Replication
👉 Interview Answer
Synchronous replication means a write is only acknowledged after replicas confirm persistence. I use this when strong consistency and durability are critical, such as financial transactions. The trade-off is higher latency and reduced availability, because writes depend on replica responsiveness.
Asynchronous Replication
👉 Interview Answer
Asynchronous replication allows the primary to acknowledge writes before replicas catch up. This improves latency and availability, but introduces replication lag and potential data loss during failover. I typically use it for large-scale systems where performance is more important than strict consistency.
Sync vs Async Summary
👉 Interview Answer
In short, synchronous replication prioritizes correctness and durability, while asynchronous replication prioritizes latency and availability. The choice depends on whether the system can tolerate temporary inconsistency or data loss.
2️⃣ Consistency vs Availability (The Real Trade-off)
CAP Trade-off
👉 Interview Answer
Choosing between sync and async replication is fundamentally a CAP trade-off. Synchronous systems lean toward consistency and may sacrifice availability during partitions, while asynchronous systems remain available but may serve stale or inconsistent data. I usually decide based on business requirements around correctness vs uptime.
Read Patterns Impact
👉 Interview Answer
In asynchronous systems, reads may be stale due to replica lag. To handle this, I typically route critical reads to the primary, or use session stickiness to ensure read-after-write consistency. For less critical reads, replicas are used to improve scalability.
3️⃣ Failure Scenarios (Staff-Level Focus)
Scenario 1: Primary crashes before replication
👉 Interview Answer
If the primary crashes before replication completes, synchronous systems are safe because replicas already have the data. In asynchronous systems, this can lead to data loss. This is a key trade-off where async systems accept durability risk for better performance.
Scenario 2: Network partition
👉 Interview Answer
During a network partition, synchronous systems may block writes because they cannot reach replicas, while asynchronous systems continue accepting writes but may diverge. This highlights the consistency vs availability trade-off under failure.
Scenario 3: Replica lag
👉 Interview Answer
Replica lag is common in async replication, where followers fall behind the leader. I usually mitigate this by routing critical reads to the leader, or using quorum reads or lag-aware routing to avoid stale data.
Failure Summary
👉 Interview Answer
The key difference is: synchronous systems fail by becoming unavailable, while asynchronous systems fail by returning inconsistent data or losing recent writes.
4️⃣ Real-world Hybrid Strategies
Quorum-based Replication
👉 Interview Answer
Quorum replication allows us to tune consistency by controlling how many replicas participate in reads and writes. By ensuring R plus W is greater than N, we can guarantee overlap and achieve strong consistency when needed, while still allowing flexibility in latency and availability.
Semi-Synchronous Replication
👉 Interview Answer
Semi-synchronous replication is a practical compromise, where the primary waits for at least one replica before acknowledging writes. This improves durability compared to async, while avoiding the full latency cost of synchronous replication.
Leader-Follower (Async)
👉 Interview Answer
In leader-follower setups, writes go to the leader and reads are served from followers. This is a common pattern to scale reads, but we need to handle replica lag carefully to avoid stale reads.
Multi-region Strategy
👉 Interview Answer
In multi-region systems, we often use synchronous replication within a region for consistency, and asynchronous replication across regions to reduce latency. This balances performance with durability and user experience.
Hybrid Summary
👉 Interview Answer
In practice, we rarely use pure sync or async — we combine them to balance consistency, latency, and availability based on system requirements.
🧠 Staff-Level Answer (Final)
👉 Interview Answer(收尾)
When discussing replication, I frame it as a consistency versus availability trade-off. Synchronous replication guarantees strong consistency and durability, but increases latency and reduces availability under failure. Asynchronous replication improves performance and availability, but introduces replication lag and potential data loss.
I also consider failure scenarios like primary crashes and network partitions, which highlight the fundamental trade-offs.
In practice, most systems use hybrid approaches, such as quorum-based or semi-synchronous replication, to balance correctness, performance, and scalability.
⭐ Staff-Level Insight
👉 Interview Answer(加分句)
Replication is not just about copying data — it’s about defining when a write is considered durable. The real design decision is choosing the right durability guarantee for each use case.
中文速背版(强化)
Sync
写成功必须复制完成 → 强一致 → 慢但安全
Async
先返回再复制 → 快但可能丢数据
CAP
Sync = consistency Async = availability
Failure
Sync:会卡住 Async:会不一致 / 丢数据
实际系统
用 hybrid(quorum / 半同步 / 跨区 async)
Implement