sd-rps Real Production Systems ·

🎯 How Uber Dispatch System Works Internally

1️⃣ Core Dispatch Framework (Staff-Level)

When discussing an Uber-like dispatch system, I frame it as:

Realtime location ingestion
Geospatial indexing
Candidate driver search
ETA and pickup-time estimation
Match scoring and offer dispatch
Acceptance, timeout, and reassignment
Marketplace balancing
Trade-offs: latency vs optimality vs fairness vs reliability

2️⃣ Core Problem

The system must match a rider with a driver quickly while conditions change every second.

Key constraints:

Drivers move continuously
Riders expect low wait time
Drivers can reject or timeout
GPS is noisy
Dense areas create hotspots
Marketplace supply and demand changes rapidly
A globally optimal match may be too slow

👉 Interview Answer

An Uber-like dispatch system is a realtime marketplace matching system. It first finds nearby available drivers using geospatial indexes, then scores them using ETA, distance, driver state, marketplace rules, and reliability signals. The system must make fast good-enough decisions because the world changes faster than a perfect optimizer can run.

3️⃣ High-Level Architecture

Driver App Location Updates
        ↓
Location Ingestion Service
        ↓
Realtime Driver State Store
        ↓
Geospatial Index
        ↓
Dispatch Service
        ↓
ETA Service + Pricing/Policy Signals
        ↓
Match Scoring
        ↓
Offer to Driver
        ↓
Accept / Timeout / Reassign

4️⃣ Location Updates

Drivers send frequent location updates:

latitude / longitude
heading
speed
timestamp
availability state
trip state

The backend writes this into a low-latency state store and updates the geospatial index.

👉 Interview Answer

Driver location is treated as fast-changing state, not as durable business history. The system keeps the latest driver state in a low-latency store and updates geospatial indexes continuously, while durable trip events are stored separately.

5️⃣ Geospatial Candidate Search

Common approaches:

Geohash cells
S2 cells
H3 hexagons
grid-based partitioning

Search pattern:

Find drivers in rider cell
        ↓
Expand to neighboring cells
        ↓
Filter unavailable drivers
        ↓
Return candidate drivers

👉 Interview Answer

The first stage should be a fast approximate geospatial search. We do not run expensive optimization across all drivers. We search nearby cells, expand if supply is low, and then score only a small candidate set.

6️⃣ ETA and Match Scoring

Candidate drivers are scored by:

pickup ETA
driver distance
traffic condition
driver acceptance probability
current driver direction
rider priority or product type
marketplace balance
cancellation risk
airport or event rules

Example scoring:

score =
  w1 * pickup_eta
+ w2 * cancellation_risk
+ w3 * marketplace_penalty
+ w4 * fairness_adjustment

👉 Interview Answer

The best driver is not always the closest driver. A production dispatch system usually scores drivers using ETA, acceptance probability, cancellation risk, product constraints, and marketplace health. This avoids locally obvious but globally poor assignments.

7️⃣ Dispatch Offer Flow

Create trip request
        ↓
Select candidate driver
        ↓
Send offer
        ↓
Wait for accept
        ↓
If timeout or reject, try next candidate
        ↓
Confirm match

Important details:

Offers need short TTLs
Driver state must be atomically reserved
Duplicate offers must be prevented
Rider should see stable progress
Reassignment must be controlled

8️⃣ Consistency and Race Conditions

Hard cases:

Same driver receives two offers
Driver goes offline after selection
GPS update is stale
Rider cancels during dispatch
Driver accepts after timeout

Common protections:

driver lease / reservation
idempotent trip state transition
versioned driver state
timeout-based offer expiry
reconciliation workers

👉 Interview Answer

Dispatch correctness depends on state transitions. I would model the trip and driver assignment as a state machine, use leases to prevent double assignment, and make accept, timeout, cancel, and reassignment idempotent.

9️⃣ Staff-Level Trade-offs

Decision	Benefit	Cost
Fast local matching	Low latency	May miss better global match
Rich scoring	Better marketplace outcome	More compute and model complexity
Short offer TTL	Faster reassignment	More churn for drivers
Strong reservation	Prevents double assignment	Adds coordination latency
Regional partitioning	Scales better	Cross-boundary matching is harder

🔟 Failure Handling

Production concerns:

reconnect storms from driver apps
stale location data
hot cells near airports or events
ETA service degradation
dispatch queue backlog
notification delivery failure

Fallbacks:

use last known location with freshness checks
degrade to simpler distance scoring
expand search radius
throttle demand with pricing or waiting
isolate hot regions

中文部分

中文速记

一句话

Uber Dispatch 本质是一个实时 marketplace matching 系统：先用地理索引快速找附近司机，再用 ETA、接单概率、取消风险和市场规则做打分。

背诵要点

地理搜索解决的是 candidate retrieval，不是最终决策
最近的司机不一定是最优司机
driver assignment 要用 lease / reservation 防止重复派单
accept、reject、timeout、cancel、reassign 都要做成幂等状态转移
核心权衡是 fast good-enough matching vs slow global optimization

中文面试回答

我会把 Uber 派单系统设计成一个实时匹配引擎。司机端持续上报位置，后端把最新司机状态写入低延迟存储，并更新 geohash、S2 或 H3 这样的地理索引。当乘客发起请求时，系统先在附近 cell 里找可用司机，再逐步扩大搜索范围。

找到候选司机后，不能只选距离最近的司机。生产系统通常会结合 pickup ETA、司机接单概率、取消风险、当前方向、车型、区域供需和业务规则做综合打分。派单时还需要给司机一个短 TTL 的 offer，并用 lease 防止一个司机被多个订单同时占用。

Staff 级重点是：这个系统追求的是低延迟下的足够好决策，而不是全局最优解。因为司机位置、乘客取消、GPS 精度和供需状态都在实时变化，慢的完美优化反而可能变成错误决策。

✅ Final Interview Answer

An Uber-like dispatch system is a realtime matching engine. Drivers continuously send location updates, which are stored in a low-latency driver state store and indexed by geospatial cells. When a rider requests a trip, the dispatch service searches nearby cells, filters available drivers, estimates pickup ETA, and scores candidates using ETA, acceptance probability, cancellation risk, product rules, and marketplace constraints.

The dispatch flow is stateful: the system sends an offer with a short TTL, reserves the driver with a lease, and handles accept, reject, timeout, cancel, and reassignment as idempotent state transitions. The hard part is not just finding the closest driver. The hard part is making low-latency decisions under noisy GPS data, fast-changing supply, driver behavior, and consistency races.

At staff level, I would emphasize that dispatch is a trade-off between latency and optimality. Real systems usually prefer fast good-enough matching with strong safeguards over slow global optimization.

System Design Deep Dive - 04 How Uber Dispatch System Works Internally

🎯 How Uber Dispatch System Works Internally

1️⃣ Core Dispatch Framework (Staff-Level)

2️⃣ Core Problem

3️⃣ High-Level Architecture

4️⃣ Location Updates

5️⃣ Geospatial Candidate Search

6️⃣ ETA and Match Scoring

7️⃣ Dispatch Offer Flow

8️⃣ Consistency and Race Conditions

9️⃣ Staff-Level Trade-offs

🔟 Failure Handling

中文部分

中文速记

一句话

背诵要点

中文面试回答

✅ Final Interview Answer

Implement