🎯 Placement Strategy in Distributed Systems
1️⃣ Core Framework
When discussing Placement Strategy in Distributed Systems, I frame it as:
- What is being placed
- Where it can be placed
- Locality requirements
- Fault isolation requirements
- Capacity and cost constraints
- Compliance and data residency
- Rebalancing and migration
- Trade-offs: latency vs resilience vs utilization
2️⃣ What Placement Strategy Means
Placement strategy means deciding where workloads, data, replicas, partitions, and jobs should run.
Examples
- Which region should serve a user?
- Which availability zone should hold a database replica?
- Which machine should run a container?
- Which worker should own a queue partition?
- Which shard should store a tenant?
- Which cache cluster should hold hot data?
Basic Architecture
Workload / Data / Replica
↓
Placement Policy
↓
Region / Zone / Rack / Host / Partition
👉 Interview Memorization
Placement strategy is the discipline of deciding where workloads and state should live in order to balance latency, resilience, capacity, cost, and compliance.
3️⃣ What Gets Placed
Common Placement Targets
- Application services
- Database replicas
- Cache nodes
- Queue partitions
- Stream processing tasks
- Search index shards
- Tenant data
- Batch jobs
- ML inference workloads
- CDN objects
Important Distinction
Placing stateless compute is easy.
Placing stateful data is hard.
Stateless workloads can often move freely.
Stateful workloads need data movement, ownership transfer, and recovery planning.
👉 Interview Memorization
Stateless workload placement is mostly a scheduling problem, while stateful placement is also a data ownership and migration problem.
4️⃣ Placement Levels
Distributed systems make placement decisions at multiple levels.
Region Level
US Region
EU Region
Asia Region
Used for latency, compliance, and disaster recovery.
Availability Zone Level
Zone A
Zone B
Zone C
Used for fault isolation inside a region.
Rack / Host Level
Rack 1 → Host A
Rack 2 → Host B
Rack 3 → Host C
Used to avoid correlated hardware failure.
Partition Level
Shard 1 → Node A
Shard 2 → Node B
Shard 3 → Node C
Used for scaling and data ownership.
👉 Interview Memorization
Placement decisions happen at many layers: region, zone, rack, host, partition, and tenant.
Good designs consider failure domains at each layer.
5️⃣ Placement Goal 1: Latency Locality
Place work close to users or close to data.
User Locality
EU User → EU Service
US User → US Service
Asia User → Asia Service
This reduces user-facing latency.
Data Locality
Compute
↓
Placed near data shard
This reduces backend network calls and cross-region traffic.
👉 Interview Memorization
Locality means placing compute near users or near data to reduce latency and network cost.
6️⃣ Placement Goal 2: Fault Isolation
Do not place all critical replicas in the same failure domain.
Bad Placement
Replica 1 → Zone A
Replica 2 → Zone A
Replica 3 → Zone A
One zone failure can remove all replicas.
Better Placement
Replica 1 → Zone A
Replica 2 → Zone B
Replica 3 → Zone C
One zone failure still leaves replicas alive.
Fault Domains
- Region
- Availability zone
- Datacenter
- Rack
- Host
- Power domain
- Network segment
👉 Interview Memorization
Fault isolation means spreading replicas across independent failure domains so one failure does not take down the whole service.
7️⃣ Placement Goal 3: Capacity Balance
Placement must avoid overloading one node, zone, or region.
Problem
Node A: 90% CPU
Node B: 20% CPU
Node C: 15% CPU
The cluster has capacity, but placement is imbalanced.
Capacity Signals
- CPU
- Memory
- Disk
- Network bandwidth
- IOPS
- Connection count
- Request rate
- Hot key frequency
👉 Interview Memorization
Capacity-aware placement avoids hotspots by placing workloads based on real resource usage, not just instance count.
8️⃣ Placement Goal 4: Cost Efficiency
Placement also affects cost.
Cost Factors
- Cross-region traffic
- Overprovisioned standby capacity
- Expensive hardware requirements
- Storage replication
- Zone or region price differences
- Reserved capacity utilization
Trade-off
Spread widely
↓
Better resilience
↓
More duplicated capacity and traffic cost
👉 Interview Memorization
Placement affects cost because spreading and replication improve resilience but often increase storage, compute, and network usage.
9️⃣ Placement Goal 5: Compliance
Some data must stay in specific legal or geographic boundaries.
Examples
- EU user data in EU
- Financial data in approved regions
- Healthcare data in compliant environments
- Enterprise tenant data in a contracted region
Architecture
Tenant A → US Region
Tenant B → EU Region
Tenant C → Canada Region
👉 Interview Memorization
Compliance-aware placement ensures data and workloads run only in regions or environments that satisfy legal, regulatory, or customer requirements.
🔟 Affinity
Affinity means placing things together.
Example
API Worker
↓
Same region as database shard
Use Cases
- Place compute near data
- Place cache near service
- Place dependent services in the same region
- Place jobs near required hardware
Benefit
- Lower latency
- Lower network cost
- Better cache locality
Risk
Too much affinity can create correlated failures.
👉 Interview Memorization
Affinity places related workloads together to improve locality, but too much affinity can increase correlated failure risk.
1️⃣1️⃣ Anti-affinity
Anti-affinity means placing things apart.
Example
Replica 1 → Host A
Replica 2 → Host B
Replica 3 → Host C
Use Cases
- Spread replicas across zones
- Avoid placing primary and backup on the same host
- Separate tenants for noisy-neighbor control
- Avoid putting all shards on one rack
Benefit
- Better availability
- Lower correlated failure risk
- Better blast-radius control
Cost
- Lower packing efficiency
- More scheduling constraints
- More capacity fragmentation
👉 Interview Memorization
Anti-affinity spreads related replicas across failure domains to reduce correlated failure risk.
1️⃣2️⃣ Resource-aware Scheduling
Resource-aware placement uses current and predicted resource usage.
Scheduler Inputs
Workload requirements
Node capacity
Current utilization
Failure-domain rules
Placement constraints
Example
High-memory job → Memory-optimized node
GPU job → GPU node
High-IOPS shard → SSD node
👉 Interview Memorization
Resource-aware scheduling places workloads based on actual resource needs such as CPU, memory, disk, network, IOPS, and specialized hardware.
1️⃣3️⃣ Data-local Scheduling
Data-local scheduling moves compute near data instead of moving data to compute.
Architecture
Data Shard
↓
Run job near shard
Best For
- Batch processing
- Stream processing
- Search indexing
- Large analytics jobs
- ML feature generation
Why It Matters
Large data movement is expensive.
Moving compute is often cheaper than moving terabytes of data.
👉 Interview Memorization
Data-local scheduling reduces network cost and latency by placing compute near the data it needs to process.
1️⃣4️⃣ Tenant-aware Placement
Tenant-aware placement decides where each customer or tenant lives.
Architecture
Tenant A → Cluster 1
Tenant B → Cluster 2
Tenant C → Cluster 3
Goals
- Isolation
- Compliance
- Predictable performance
- Blast-radius control
- Enterprise region requirements
Risk
Large tenants may create hotspots.
Small tenants may be packed together for efficiency.
👉 Interview Memorization
Tenant-aware placement balances isolation, compliance, and utilization by assigning tenants to regions, clusters, shards, or cells deliberately.
1️⃣5️⃣ Replica Placement
Replica placement decides where copies of data or services should run.
Bad Replica Placement
Primary → Host A
Replica → Host A
Host failure removes both.
Better Replica Placement
Primary → Zone A
Replica 1 → Zone B
Replica 2 → Zone C
Key Rule
Replicas should not share the same failure domain.
👉 Interview Memorization
Replica placement should spread copies across independent failure domains so one outage does not destroy availability or durability.
1️⃣6️⃣ Leader Placement
Leader placement affects write latency and availability.
Example
US users write mostly
↓
Place leader in US
Multi-region Question
Where should the primary leader live?
Consider:
- Write traffic location
- Quorum latency
- Failover target
- Compliance
- Regional outage risk
👉 Interview Memorization
Leader placement should follow the dominant write path while still preserving failover and fault-isolation requirements.
1️⃣7️⃣ Hotspot-aware Placement
Hotspots happen when one placement target receives too much traffic.
Example
Shard 1: 10,000 QPS
Shard 2: 500 QPS
Shard 3: 400 QPS
Solutions
- Split hot shards
- Move hot tenants
- Add replicas for hot reads
- Use consistent hashing with virtual nodes
- Separate large tenants
- Add request routing limits
👉 Interview Memorization
Hotspot-aware placement detects overloaded shards, tenants, or nodes and redistributes work before one placement decision becomes a bottleneck.
1️⃣8️⃣ Rebalancing
Placement is not a one-time decision.
Systems change over time.
Why Rebalancing Happens
- New nodes are added
- Nodes fail
- Traffic shifts
- Tenants grow
- Data grows
- Hot partitions appear
- Regions are added
Rebalancing Flow
Detect imbalance
↓
Choose movement plan
↓
Copy state
↓
Shift ownership
↓
Verify health
👉 Interview Memorization
Rebalancing updates placement as load, capacity, and failures change.
It must be rate-limited and observable because moving state can hurt production traffic.
1️⃣9️⃣ Placement Trade-off Table
| Strategy | Benefit | Cost |
|---|---|---|
| User-local placement | Lower user latency | More regional complexity |
| Data-local placement | Lower backend traffic | Less scheduling flexibility |
| Strong anti-affinity | Better resilience | Lower packing efficiency |
| Aggressive packing | Better utilization | Higher correlated failure risk |
| Tenant isolation | Smaller blast radius | More capacity fragmentation |
| Leader near writes | Lower write latency | Harder global fairness |
| Geo placement | Compliance and locality | Harder migration |
👉 Interview Memorization
Placement is a multi-objective optimization problem.
Improving latency, resilience, utilization, compliance, and cost at the same time requires explicit trade-offs.
2️⃣0️⃣ Observability
Monitor
- Resource utilization by node
- Resource utilization by zone
- Resource utilization by region
- Hot shards
- Hot tenants
- Replica distribution
- Leader distribution
- Cross-region traffic
- Rebalance progress
- Placement rule violations
- Failed scheduling attempts
- Capacity fragmentation
👉 Interview Memorization
Placement observability must show where workloads live, whether placement rules are satisfied, and whether any node, zone, region, tenant, or shard is becoming hot.
2️⃣1️⃣ Best Practices
Practical Rules
- Define failure domains explicitly
- Keep replicas across independent zones or regions
- Place compute near users when latency matters
- Place compute near data when data movement is expensive
- Use anti-affinity for critical replicas
- Use affinity only when locality clearly matters
- Use tenant isolation for large or regulated customers
- Monitor hotspots continuously
- Make rebalancing gradual and reversible
- Keep placement decisions explainable
Design Principle
Locality improves latency.
Separation improves resilience.
Packing improves utilization.
You rarely maximize all three at once.
👉 Interview Memorization
Good placement strategy balances locality, fault isolation, and utilization instead of optimizing only one dimension.
🧠 Staff-Level Answer Final
👉 Full Interview Answer
Placement strategy is about deciding where services, data, replicas, partitions, tenants, and jobs should run in a distributed system.
The main goals are low latency, fault isolation, efficient capacity usage, cost control, and compliance.
If the workload is user-facing, I would place compute close to users.
If the workload processes large data, I would place compute close to the data.
For replicas, I would use anti-affinity so copies are spread across independent failure domains such as zones, racks, or regions.
For stateful systems, placement is harder because moving work also means moving data ownership, warming caches, transferring leadership, or rebalancing partitions.
The key trade-off is locality versus fault isolation versus utilization.
Too much locality can create correlated failures.
Too much spreading can waste capacity and increase latency.
Too much packing can improve cost but increase blast radius.
A good design defines placement policies explicitly, monitors hotspots and rule violations, and supports gradual rebalancing as traffic and capacity change.
⭐ Final Insight
Placement Strategy 的核心不是:
“把服务随便放到机器上”
而是:
Locality
- Fault Domain
- Capacity
- Cost
- Compliance
- Rebalancing
最重要的一句话:
Placement is the art of balancing locality, resilience, and utilization.
中文部分
🎯 Placement Strategy in Distributed Systems(分布式系统中的放置策略)
核心理解
Placement Strategy 指的是:
服务、数据、副本、任务、租户应该放在哪里
它不是简单调度问题,而是综合权衡:
- 延迟
- 可用性
- 故障隔离
- 资源利用率
- 成本
- 合规
- 数据迁移
放置什么
常见对象包括:
- API service
- Database replica
- Cache node
- Kafka partition
- Search shard
- Tenant data
- Batch job
- Stream processor
放置到哪里
常见层级:
Region
↓
Availability Zone
↓
Rack
↓
Host
↓
Partition / Shard
每一层都有自己的 failure domain。
目标 1:降低延迟
用户服务通常应该靠近用户:
US User → US Region
EU User → EU Region
Asia User → Asia Region
数据处理通常应该靠近数据:
Compute near Data
这样可以减少网络延迟和跨区域流量。
目标 2:故障隔离
不要把所有副本放在同一个故障域。
错误做法:
Replica 1 → Zone A
Replica 2 → Zone A
Replica 3 → Zone A
更好做法:
Replica 1 → Zone A
Replica 2 → Zone B
Replica 3 → Zone C
目标 3:容量均衡
Placement 还要避免热点:
Node A: 90% CPU
Node B: 20% CPU
Node C: 15% CPU
这种情况下集群总体有容量,但放置不均衡。
Affinity
Affinity 指的是把相关对象放在一起。
例子:
API Worker 放在 Database Shard 附近
好处:
- 延迟低
- 网络成本低
- cache locality 更好
风险:
- 相关对象一起故障
- blast radius 变大
Anti-affinity
Anti-affinity 指的是把相关对象分开放。
例子:
Replica 1 → Host A
Replica 2 → Host B
Replica 3 → Host C
好处:
- 可用性更高
- 避免相关故障
- 降低单点风险
代价:
- 资源打包效率下降
- 调度约束变复杂
Stateful Placement 更难
Stateless service 可以比较容易迁移:
Stop instance
Start new instance
Stateful service 迁移更复杂:
Copy data
Transfer ownership
Warm cache
Redirect traffic
Verify correctness
Rebalancing
Placement 不是一次性决定。
当系统变化时,需要 rebalancing:
- 新机器加入
- 机器故障
- 流量变化
- 租户变大
- shard 变热
- 新 region 上线
对比表
| 策略 | 好处 | 代价 |
|---|---|---|
| 靠近用户 | 用户延迟低 | 多区域复杂度高 |
| 靠近数据 | 后端流量少 | 调度灵活性下降 |
| 强 anti-affinity | 可用性高 | 资源利用率下降 |
| 强 packing | 成本低 | 相关故障风险高 |
| 租户隔离 | blast radius 小 | 容量碎片化 |
面试回答模板
Placement strategy means deciding where workloads, replicas, shards, tenants, and jobs should run.
The goal is to balance latency, fault isolation, capacity utilization, cost, and compliance.
I would place user-facing services close to users, place data-processing jobs close to data, and spread replicas across independent failure domains.
For stateful systems, placement is more complex because moving a workload may require moving data, transferring ownership, warming cache, or rebalancing partitions.
The main trade-off is locality versus resilience versus utilization.
Good systems define placement rules clearly, monitor hotspots, and rebalance gradually as traffic and capacity change.
最终总结
Locality improves latency.
Separation improves resilience.
Packing improves utilization.
最重要的原则:
Placement = latency + fault domain + capacity + cost + compliance
Implement