sd-mds Modern Distributed Systems ·

🎯 Placement Strategy in Distributed Systems

1️⃣ Core Framework

When discussing Placement Strategy in Distributed Systems, I frame it as:

What is being placed
Where it can be placed
Locality requirements
Fault isolation requirements
Capacity and cost constraints
Compliance and data residency
Rebalancing and migration
Trade-offs: latency vs resilience vs utilization

2️⃣ What Placement Strategy Means

Placement strategy means deciding where workloads, data, replicas, partitions, and jobs should run.

Examples

Which region should serve a user?
Which availability zone should hold a database replica?
Which machine should run a container?
Which worker should own a queue partition?
Which shard should store a tenant?
Which cache cluster should hold hot data?

Basic Architecture

Workload / Data / Replica

↓

Placement Policy

↓

Region / Zone / Rack / Host / Partition

👉 Interview Memorization

Placement strategy is the discipline of deciding where workloads and state should live in order to balance latency, resilience, capacity, cost, and compliance.

3️⃣ What Gets Placed

Common Placement Targets

Application services
Database replicas
Cache nodes
Queue partitions
Stream processing tasks
Search index shards
Tenant data
Batch jobs
ML inference workloads
CDN objects

Important Distinction

Placing stateless compute is easy.

Placing stateful data is hard.

Stateless workloads can often move freely.

Stateful workloads need data movement, ownership transfer, and recovery planning.

👉 Interview Memorization

Stateless workload placement is mostly a scheduling problem, while stateful placement is also a data ownership and migration problem.

4️⃣ Placement Levels

Distributed systems make placement decisions at multiple levels.

Region Level

US Region
EU Region
Asia Region

Used for latency, compliance, and disaster recovery.

Availability Zone Level

Zone A
Zone B
Zone C

Used for fault isolation inside a region.

Rack / Host Level

Rack 1 → Host A
Rack 2 → Host B
Rack 3 → Host C

Used to avoid correlated hardware failure.

Partition Level

Shard 1 → Node A
Shard 2 → Node B
Shard 3 → Node C

Used for scaling and data ownership.

👉 Interview Memorization

Placement decisions happen at many layers: region, zone, rack, host, partition, and tenant.

Good designs consider failure domains at each layer.

5️⃣ Placement Goal 1: Latency Locality

Place work close to users or close to data.

User Locality

EU User → EU Service

US User → US Service

Asia User → Asia Service

This reduces user-facing latency.

Data Locality

Compute

↓

Placed near data shard

This reduces backend network calls and cross-region traffic.

👉 Interview Memorization

Locality means placing compute near users or near data to reduce latency and network cost.

6️⃣ Placement Goal 2: Fault Isolation

Do not place all critical replicas in the same failure domain.

Bad Placement

Replica 1 → Zone A
Replica 2 → Zone A
Replica 3 → Zone A

One zone failure can remove all replicas.

Better Placement

Replica 1 → Zone A
Replica 2 → Zone B
Replica 3 → Zone C

One zone failure still leaves replicas alive.

Fault Domains

Region
Availability zone
Datacenter
Rack
Host
Power domain
Network segment

👉 Interview Memorization

Fault isolation means spreading replicas across independent failure domains so one failure does not take down the whole service.

7️⃣ Placement Goal 3: Capacity Balance

Placement must avoid overloading one node, zone, or region.

Problem

Node A: 90% CPU
Node B: 20% CPU
Node C: 15% CPU

The cluster has capacity, but placement is imbalanced.

Capacity Signals

CPU
Memory
Disk
Network bandwidth
IOPS
Connection count
Request rate
Hot key frequency

👉 Interview Memorization

Capacity-aware placement avoids hotspots by placing workloads based on real resource usage, not just instance count.

8️⃣ Placement Goal 4: Cost Efficiency

Placement also affects cost.

Cost Factors

Cross-region traffic
Overprovisioned standby capacity
Expensive hardware requirements
Storage replication
Zone or region price differences
Reserved capacity utilization

Trade-off

Spread widely

↓

Better resilience

↓

More duplicated capacity and traffic cost

👉 Interview Memorization

Placement affects cost because spreading and replication improve resilience but often increase storage, compute, and network usage.

9️⃣ Placement Goal 5: Compliance

Some data must stay in specific legal or geographic boundaries.

Examples

EU user data in EU
Financial data in approved regions
Healthcare data in compliant environments
Enterprise tenant data in a contracted region

Architecture

Tenant A → US Region

Tenant B → EU Region

Tenant C → Canada Region

👉 Interview Memorization

Compliance-aware placement ensures data and workloads run only in regions or environments that satisfy legal, regulatory, or customer requirements.

🔟 Affinity

Affinity means placing things together.

Example

API Worker

↓

Same region as database shard

Use Cases

Place compute near data
Place cache near service
Place dependent services in the same region
Place jobs near required hardware

Benefit

Lower latency
Lower network cost
Better cache locality

Risk

Too much affinity can create correlated failures.

👉 Interview Memorization

Affinity places related workloads together to improve locality, but too much affinity can increase correlated failure risk.

1️⃣1️⃣ Anti-affinity

Anti-affinity means placing things apart.

Example

Replica 1 → Host A

Replica 2 → Host B

Replica 3 → Host C

Use Cases

Spread replicas across zones
Avoid placing primary and backup on the same host
Separate tenants for noisy-neighbor control
Avoid putting all shards on one rack

Benefit

Better availability
Lower correlated failure risk
Better blast-radius control

Cost

Lower packing efficiency
More scheduling constraints
More capacity fragmentation

👉 Interview Memorization

Anti-affinity spreads related replicas across failure domains to reduce correlated failure risk.

1️⃣2️⃣ Resource-aware Scheduling

Resource-aware placement uses current and predicted resource usage.

Scheduler Inputs

Workload requirements

Node capacity

Current utilization

Failure-domain rules

Placement constraints

Example

High-memory job → Memory-optimized node

GPU job → GPU node

High-IOPS shard → SSD node

👉 Interview Memorization

Resource-aware scheduling places workloads based on actual resource needs such as CPU, memory, disk, network, IOPS, and specialized hardware.

1️⃣3️⃣ Data-local Scheduling

Data-local scheduling moves compute near data instead of moving data to compute.

Architecture

Data Shard

↓

Run job near shard

Best For

Batch processing
Stream processing
Search indexing
Large analytics jobs
ML feature generation

Why It Matters

Large data movement is expensive.

Moving compute is often cheaper than moving terabytes of data.

👉 Interview Memorization

Data-local scheduling reduces network cost and latency by placing compute near the data it needs to process.

1️⃣4️⃣ Tenant-aware Placement

Tenant-aware placement decides where each customer or tenant lives.

Architecture

Tenant A → Cluster 1

Tenant B → Cluster 2

Tenant C → Cluster 3

Goals

Isolation
Compliance
Predictable performance
Blast-radius control
Enterprise region requirements

Risk

Large tenants may create hotspots.

Small tenants may be packed together for efficiency.

👉 Interview Memorization

Tenant-aware placement balances isolation, compliance, and utilization by assigning tenants to regions, clusters, shards, or cells deliberately.

1️⃣5️⃣ Replica Placement

Replica placement decides where copies of data or services should run.

Bad Replica Placement

Primary → Host A

Replica → Host A

Host failure removes both.

Better Replica Placement

Primary → Zone A

Replica 1 → Zone B

Replica 2 → Zone C

Key Rule

Replicas should not share the same failure domain.

👉 Interview Memorization

Replica placement should spread copies across independent failure domains so one outage does not destroy availability or durability.

1️⃣6️⃣ Leader Placement

Leader placement affects write latency and availability.

Example

US users write mostly

↓

Place leader in US

Multi-region Question

Where should the primary leader live?

Consider:

Write traffic location
Quorum latency
Failover target
Compliance
Regional outage risk

👉 Interview Memorization

Leader placement should follow the dominant write path while still preserving failover and fault-isolation requirements.

1️⃣7️⃣ Hotspot-aware Placement

Hotspots happen when one placement target receives too much traffic.

Example

Shard 1: 10,000 QPS

Shard 2: 500 QPS

Shard 3: 400 QPS

Solutions

Split hot shards
Move hot tenants
Add replicas for hot reads
Use consistent hashing with virtual nodes
Separate large tenants
Add request routing limits

👉 Interview Memorization

Hotspot-aware placement detects overloaded shards, tenants, or nodes and redistributes work before one placement decision becomes a bottleneck.

1️⃣8️⃣ Rebalancing

Placement is not a one-time decision.

Systems change over time.

Why Rebalancing Happens

New nodes are added
Nodes fail
Traffic shifts
Tenants grow
Data grows
Hot partitions appear
Regions are added

Rebalancing Flow

Detect imbalance

↓

Choose movement plan

↓

Copy state

↓

Shift ownership

↓

Verify health

👉 Interview Memorization

Rebalancing updates placement as load, capacity, and failures change.

It must be rate-limited and observable because moving state can hurt production traffic.

1️⃣9️⃣ Placement Trade-off Table

Strategy	Benefit	Cost
User-local placement	Lower user latency	More regional complexity
Data-local placement	Lower backend traffic	Less scheduling flexibility
Strong anti-affinity	Better resilience	Lower packing efficiency
Aggressive packing	Better utilization	Higher correlated failure risk
Tenant isolation	Smaller blast radius	More capacity fragmentation
Leader near writes	Lower write latency	Harder global fairness
Geo placement	Compliance and locality	Harder migration

👉 Interview Memorization

Placement is a multi-objective optimization problem.

Improving latency, resilience, utilization, compliance, and cost at the same time requires explicit trade-offs.

2️⃣0️⃣ Observability

Monitor

Resource utilization by node
Resource utilization by zone
Resource utilization by region
Hot shards
Hot tenants
Replica distribution
Leader distribution
Cross-region traffic
Rebalance progress
Placement rule violations
Failed scheduling attempts
Capacity fragmentation

👉 Interview Memorization

Placement observability must show where workloads live, whether placement rules are satisfied, and whether any node, zone, region, tenant, or shard is becoming hot.

2️⃣1️⃣ Best Practices

Practical Rules

Define failure domains explicitly
Keep replicas across independent zones or regions
Place compute near users when latency matters
Place compute near data when data movement is expensive
Use anti-affinity for critical replicas
Use affinity only when locality clearly matters
Use tenant isolation for large or regulated customers
Monitor hotspots continuously
Make rebalancing gradual and reversible
Keep placement decisions explainable

Design Principle

Locality improves latency.

Separation improves resilience.

Packing improves utilization.

You rarely maximize all three at once.

👉 Interview Memorization

Good placement strategy balances locality, fault isolation, and utilization instead of optimizing only one dimension.

🧠 Staff-Level Answer Final

👉 Full Interview Answer

Placement strategy is about deciding where services, data, replicas, partitions, tenants, and jobs should run in a distributed system.

The main goals are low latency, fault isolation, efficient capacity usage, cost control, and compliance.

If the workload is user-facing, I would place compute close to users.

If the workload processes large data, I would place compute close to the data.

For replicas, I would use anti-affinity so copies are spread across independent failure domains such as zones, racks, or regions.

For stateful systems, placement is harder because moving work also means moving data ownership, warming caches, transferring leadership, or rebalancing partitions.

The key trade-off is locality versus fault isolation versus utilization.

Too much locality can create correlated failures.

Too much spreading can waste capacity and increase latency.

Too much packing can improve cost but increase blast radius.

A good design defines placement policies explicitly, monitors hotspots and rule violations, and supports gradual rebalancing as traffic and capacity change.

⭐ Final Insight

Placement Strategy 的核心不是：

“把服务随便放到机器上”

而是：

Locality

Fault Domain

Capacity

Cost

Compliance

Rebalancing

最重要的一句话：

Placement is the art of balancing locality, resilience, and utilization.

中文部分

🎯 Placement Strategy in Distributed Systems（分布式系统中的放置策略）

核心理解

Placement Strategy 指的是：

服务、数据、副本、任务、租户应该放在哪里

它不是简单调度问题，而是综合权衡：

延迟
可用性
故障隔离
资源利用率
成本
合规
数据迁移

放置什么

常见对象包括：

API service
Database replica
Cache node
Kafka partition
Search shard
Tenant data
Batch job
Stream processor

放置到哪里

常见层级：

Region

↓

Availability Zone

↓

Rack

↓

Host

↓

Partition / Shard

每一层都有自己的 failure domain。

目标 1：降低延迟

用户服务通常应该靠近用户：

US User → US Region

EU User → EU Region

Asia User → Asia Region

数据处理通常应该靠近数据：

Compute near Data

这样可以减少网络延迟和跨区域流量。

目标 2：故障隔离

不要把所有副本放在同一个故障域。

错误做法：

Replica 1 → Zone A
Replica 2 → Zone A
Replica 3 → Zone A

更好做法：

Replica 1 → Zone A
Replica 2 → Zone B
Replica 3 → Zone C

目标 3：容量均衡

Placement 还要避免热点：

Node A: 90% CPU
Node B: 20% CPU
Node C: 15% CPU

这种情况下集群总体有容量，但放置不均衡。

Affinity

Affinity 指的是把相关对象放在一起。

例子：

API Worker 放在 Database Shard 附近

好处：

延迟低
网络成本低
cache locality 更好

风险：

相关对象一起故障
blast radius 变大

Anti-affinity

Anti-affinity 指的是把相关对象分开放。

例子：

Replica 1 → Host A

Replica 2 → Host B

Replica 3 → Host C

好处：

可用性更高
避免相关故障
降低单点风险

代价：

资源打包效率下降
调度约束变复杂

Stateful Placement 更难

Stateless service 可以比较容易迁移：

Stop instance

Start new instance

Stateful service 迁移更复杂：

Copy data

Transfer ownership

Warm cache

Redirect traffic

Verify correctness

Rebalancing

Placement 不是一次性决定。

当系统变化时，需要 rebalancing：

新机器加入
机器故障
流量变化
租户变大
shard 变热
新 region 上线

对比表

策略	好处	代价
靠近用户	用户延迟低	多区域复杂度高
靠近数据	后端流量少	调度灵活性下降
强 anti-affinity	可用性高	资源利用率下降
强 packing	成本低	相关故障风险高
租户隔离	blast radius 小	容量碎片化

面试回答模板

Placement strategy means deciding where workloads, replicas, shards, tenants, and jobs should run.

The goal is to balance latency, fault isolation, capacity utilization, cost, and compliance.

I would place user-facing services close to users, place data-processing jobs close to data, and spread replicas across independent failure domains.

For stateful systems, placement is more complex because moving a workload may require moving data, transferring ownership, warming cache, or rebalancing partitions.

The main trade-off is locality versus resilience versus utilization.

Good systems define placement rules clearly, monitor hotspots, and rebalance gradually as traffic and capacity change.

最终总结

Locality improves latency.

Separation improves resilience.

Packing improves utilization.

最重要的原则：

Placement = latency + fault domain + capacity + cost + compliance