q&a-p Data Layer Decisions ·

🎯 Core Replication Framework

👉 Interview Answer

When discussing replication, I usually break it down into consistency vs availability trade-offs, replication models like sync and async, failure scenarios, and finally how real systems combine these approaches in practice.

1️⃣ Sync vs Async Replication

Synchronous Replication

👉 Interview Answer

Synchronous replication means a write is only acknowledged after replicas confirm persistence. I use this when strong consistency and durability are critical, such as financial transactions. The trade-off is higher latency and reduced availability, because writes depend on replica responsiveness.

Asynchronous Replication

👉 Interview Answer

Asynchronous replication allows the primary to acknowledge writes before replicas catch up. This improves latency and availability, but introduces replication lag and potential data loss during failover. I typically use it for large-scale systems where performance is more important than strict consistency.

Sync vs Async Summary

👉 Interview Answer

In short, synchronous replication prioritizes correctness and durability, while asynchronous replication prioritizes latency and availability. The choice depends on whether the system can tolerate temporary inconsistency or data loss.

2️⃣ Consistency vs Availability (The Real Trade-off)

CAP Trade-off

👉 Interview Answer

Choosing between sync and async replication is fundamentally a CAP trade-off. Synchronous systems lean toward consistency and may sacrifice availability during partitions, while asynchronous systems remain available but may serve stale or inconsistent data. I usually decide based on business requirements around correctness vs uptime.

Read Patterns Impact

👉 Interview Answer

In asynchronous systems, reads may be stale due to replica lag. To handle this, I typically route critical reads to the primary, or use session stickiness to ensure read-after-write consistency. For less critical reads, replicas are used to improve scalability.

3️⃣ Failure Scenarios (Staff-Level Focus)

Scenario 1: Primary crashes before replication

👉 Interview Answer

If the primary crashes before replication completes, synchronous systems are safe because replicas already have the data. In asynchronous systems, this can lead to data loss. This is a key trade-off where async systems accept durability risk for better performance.

Scenario 2: Network partition

👉 Interview Answer

During a network partition, synchronous systems may block writes because they cannot reach replicas, while asynchronous systems continue accepting writes but may diverge. This highlights the consistency vs availability trade-off under failure.

Scenario 3: Replica lag

👉 Interview Answer

Replica lag is common in async replication, where followers fall behind the leader. I usually mitigate this by routing critical reads to the leader, or using quorum reads or lag-aware routing to avoid stale data.

Failure Summary

👉 Interview Answer

The key difference is: synchronous systems fail by becoming unavailable, while asynchronous systems fail by returning inconsistent data or losing recent writes.

4️⃣ Real-world Hybrid Strategies

Quorum-based Replication

👉 Interview Answer

Quorum replication allows us to tune consistency by controlling how many replicas participate in reads and writes. By ensuring R plus W is greater than N, we can guarantee overlap and achieve strong consistency when needed, while still allowing flexibility in latency and availability.

Semi-Synchronous Replication

👉 Interview Answer

Semi-synchronous replication is a practical compromise, where the primary waits for at least one replica before acknowledging writes. This improves durability compared to async, while avoiding the full latency cost of synchronous replication.

Leader-Follower (Async)

👉 Interview Answer

In leader-follower setups, writes go to the leader and reads are served from followers. This is a common pattern to scale reads, but we need to handle replica lag carefully to avoid stale reads.

Multi-region Strategy

👉 Interview Answer

In multi-region systems, we often use synchronous replication within a region for consistency, and asynchronous replication across regions to reduce latency. This balances performance with durability and user experience.

Hybrid Summary

👉 Interview Answer

In practice, we rarely use pure sync or async — we combine them to balance consistency, latency, and availability based on system requirements.

🧠 Staff-Level Answer (Final)

👉 Interview Answer（收尾）

When discussing replication, I frame it as a consistency versus availability trade-off. Synchronous replication guarantees strong consistency and durability, but increases latency and reduces availability under failure. Asynchronous replication improves performance and availability, but introduces replication lag and potential data loss.

I also consider failure scenarios like primary crashes and network partitions, which highlight the fundamental trade-offs.

In practice, most systems use hybrid approaches, such as quorum-based or semi-synchronous replication, to balance correctness, performance, and scalability.

⭐ Staff-Level Insight

👉 Interview Answer（加分句）

Replication is not just about copying data — it’s about defining when a write is considered durable. The real design decision is choosing the right durability guarantee for each use case.

Replication in distributed systems

🎯 Core Replication Framework

1️⃣ Sync vs Async Replication

Synchronous Replication

Asynchronous Replication

Sync vs Async Summary

2️⃣ Consistency vs Availability (The Real Trade-off)

CAP Trade-off

Read Patterns Impact

3️⃣ Failure Scenarios (Staff-Level Focus)

Scenario 1: Primary crashes before replication

Scenario 2: Network partition

Scenario 3: Replica lag

Failure Summary

4️⃣ Real-world Hybrid Strategies

Quorum-based Replication

Semi-Synchronous Replication

Leader-Follower (Async)

Multi-region Strategy

Hybrid Summary

🧠 Staff-Level Answer (Final)

⭐ Staff-Level Insight

中文部分

中文速背版（强化）

Sync

Async

CAP

Failure

实际系统

Implement