🎯 Design Booking System
1️⃣ Core Framework
When discussing Booking System design, I frame it as:
- Resource and availability model
- Search availability
- Temporary hold / reservation
- Booking confirmation
- Payment integration
- Cancellation and refund
- Double-booking prevention
- Consistency, scaling, and failure handling
2️⃣ Core Requirements
Functional Requirements
- User can search available resources
- User can view available time slots
- User can temporarily hold a slot
- User can confirm booking
- User can pay for booking
- User can cancel booking
- Support booking expiration
- Support refunds
- Prevent double booking
- Support booking history
Non-functional Requirements
- Strong correctness for booking confirmation
- Low-latency availability search
- High availability
- Scalable read traffic
- Idempotent booking APIs
- Auditable booking state changes
- Eventually consistent availability display is acceptable
- Strong consistency required for final booking
👉 Interview Answer
A booking system manages limited resources over time.
The most important challenge is preventing double booking.
I would separate availability search, temporary holds, final confirmation, payment, and cancellation into clear state transitions.
3️⃣ Core Concepts
Resource
A resource can be:
- Hotel room
- Restaurant table
- Doctor appointment slot
- Flight seat
- Event ticket
- Rental car
Time Slot
A booking usually reserves a resource for a time range.
resource_id = room_101
start_time = 2026-05-03 18:00
end_time = 2026-05-04 11:00
Availability
Availability means:
capacity - confirmed_bookings - active_holds > 0
👉 Interview Answer
I would model booking around resources and time slots.
A resource has limited capacity, and availability is calculated by subtracting confirmed bookings and active holds from total capacity.
4️⃣ Main APIs
Search Availability
GET /api/availability?resourceType=hotel&location=NYC&start=2026-06-01&end=2026-06-03
Create Hold
POST /api/holds
Request:
{
"userId": "u123",
"resourceId": "room_type_deluxe",
"startTime": "2026-06-01T15:00:00Z",
"endTime": "2026-06-03T11:00:00Z",
"quantity": 1
}
Confirm Booking
POST /api/bookings
Request:
{
"holdId": "h789",
"paymentMethodId": "pm123"
}
Cancel Booking
POST /api/bookings/{bookingId}/cancel
Get Booking
GET /api/bookings/{bookingId}
👉 Interview Answer
The core APIs are search availability, create hold, confirm booking, cancel booking, and get booking status.
Search can be eventually consistent, but hold and confirm must be strongly controlled.
5️⃣ Data Model
Resource Table
resource (
resource_id VARCHAR PRIMARY KEY,
resource_type VARCHAR,
name VARCHAR,
location VARCHAR,
capacity INT,
status VARCHAR,
metadata JSON,
created_at TIMESTAMP
)
Availability Slot Table
availability_slot (
resource_id VARCHAR,
slot_start TIMESTAMP,
slot_end TIMESTAMP,
total_capacity INT,
confirmed_count INT,
held_count INT,
version BIGINT,
updated_at TIMESTAMP,
PRIMARY KEY (resource_id, slot_start)
)
Hold Table
booking_hold (
hold_id VARCHAR PRIMARY KEY,
user_id VARCHAR,
resource_id VARCHAR,
start_time TIMESTAMP,
end_time TIMESTAMP,
quantity INT,
status VARCHAR, -- active, confirmed, released, expired
expires_at TIMESTAMP,
idempotency_key VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Booking Table
booking (
booking_id VARCHAR PRIMARY KEY,
hold_id VARCHAR,
user_id VARCHAR,
resource_id VARCHAR,
start_time TIMESTAMP,
end_time TIMESTAMP,
quantity INT,
status VARCHAR, -- confirmed, cancelled, completed
payment_status VARCHAR,
total_price DECIMAL,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Booking Event Table
booking_event (
event_id VARCHAR PRIMARY KEY,
booking_id VARCHAR,
event_type VARCHAR,
actor_id VARCHAR,
created_at TIMESTAMP,
metadata JSON
)
👉 Interview Answer
I would keep separate tables for resources, availability slots, temporary holds, confirmed bookings, and booking events.
The event table provides an audit trail for debugging, support, reconciliation, and dispute handling.
6️⃣ Availability Search
Search Flow
User searches dates/location
→ Availability service queries read model
→ Filter resources with available capacity
→ Rank results by price, distance, rating
→ Return available options
Why Read Model?
Availability search is read-heavy.
Use:
booking write model
→ availability events
→ read-optimized availability index/cache
Important Rule
Search result is not a guarantee.
Final guarantee happens during hold or confirmation.
👉 Interview Answer
Availability search can use a read-optimized cache or index.
It may be slightly stale, because many users are browsing.
The system must revalidate availability when creating a hold or confirming a booking.
7️⃣ Hold / Temporary Reservation Flow
Why Hold?
User needs time to complete checkout.
Without hold:
User sees available slot
→ spends 2 minutes entering payment
→ slot gets booked by someone else
Hold Flow
User selects slot
→ Booking service checks availability
→ Atomically increases held_count
→ Creates hold with expiration
→ Returns hold to user
Atomic Update
UPDATE availability_slot
SET held_count = held_count + 1,
version = version + 1
WHERE resource_id = 'room_type_deluxe'
AND slot_start = '2026-06-01'
AND total_capacity - confirmed_count - held_count >= 1;
👉 Interview Answer
I would use temporary holds to protect inventory during checkout.
Creating a hold must atomically check available capacity and increase the held count.
Holds should expire automatically if the user does not complete booking.
8️⃣ Booking Confirmation Flow
Flow
User confirms booking
→ Validate hold is active and not expired
→ Authorize/capture payment
→ Convert hold to confirmed booking
→ Move held_count to confirmed_count
→ Mark hold confirmed
→ Emit booking confirmed event
Count Update
held_count = held_count - quantity
confirmed_count = confirmed_count + quantity
Important Rule
Booking confirmation must be idempotent.
If user retries confirmation, the system must not create duplicate bookings.
👉 Interview Answer
Booking confirmation converts an active hold into a confirmed booking.
The system must verify the hold is active, process payment, update availability, create the booking record, and mark the hold confirmed.
This operation must be idempotent to avoid duplicate bookings.
9️⃣ Double-booking Prevention
Main Risk
Multiple users try to book the same resource and time slot.
Techniques
1. Conditional Update
WHERE available_capacity >= quantity
2. Optimistic Locking
Use version column.
update where version = old_version
3. Row-level Locking
Lock resource-slot row during hold/confirm.
4. Single-writer per Resource Slot
Route writes for the same resource slot to one partition.
5. Queue for High-demand Events
Serialize booking requests for extremely hot slots.
👉 Interview Answer
Double-booking prevention depends on atomic updates.
I would use conditional writes, optimistic locking, or row-level locking to ensure confirmed plus held quantity never exceeds capacity.
For very hot events, a queue-based single-writer model can serialize bookings.
🔟 Payment Integration
Payment Strategy
Common approach:
Hold first
→ Authorize payment
→ Confirm booking
→ Capture payment
Failure Cases
- Hold succeeds, payment fails → release hold
- Payment succeeds, booking update fails → retry / reconcile
- Booking cancelled → refund according to policy
- Payment timeout → check provider / reconcile
Saga Pattern
create hold
authorize payment
confirm booking
capture payment
Compensation:
release hold
void authorization
refund payment
👉 Interview Answer
Booking and payment should be coordinated using a saga.
If payment fails, the hold should be released.
If booking confirmation fails after payment, the system should retry or reconcile, because payment correctness and booking correctness must stay aligned.
1️⃣1️⃣ Hold Expiration
Why Needed?
Users may abandon checkout.
Without expiration, resources can remain locked forever.
Expiration Flow
Hold expires
→ Expiration worker scans active holds
→ Release hold
→ Decrease held_count
→ Mark hold expired
→ Emit hold expired event
Implementation Options
- Background scanner
- Delay queue
- TTL-based scheduler
- Time-wheel scheduler
👉 Interview Answer
Holds must have expiration times.
If the user does not confirm in time, a background worker or delayed queue releases the hold, decreases held count, and makes the slot available again.
1️⃣2️⃣ Cancellation and Refund
Cancellation Flow
User cancels booking
→ Check cancellation policy
→ Mark booking cancelled
→ Decrease confirmed_count
→ Process refund if eligible
→ Emit booking cancelled event
Cancellation Policies
Examples:
- Free cancellation before 24 hours
- Partial refund after cutoff
- No refund after start time
- Provider-specific policy
👉 Interview Answer
Cancellation should go through the booking state machine.
The system checks cancellation policy, updates booking state, restores availability if applicable, and triggers refund based on policy.
Cancellation and refund should both be idempotent.
1️⃣3️⃣ Booking State Machine
Common States
HOLD_CREATED
HOLD_ACTIVE
HOLD_EXPIRED
BOOKING_CONFIRMED
PAYMENT_AUTHORIZED
PAYMENT_CAPTURED
BOOKING_CANCELLED
BOOKING_COMPLETED
REFUNDED
Why State Machine?
- Prevent invalid transitions
- Support retries
- Coordinate payment
- Support cancellation rules
- Improve auditability
👉 Interview Answer
I would model booking as a state machine.
This makes transitions explicit, prevents invalid states, and helps coordinate holds, payment, confirmation, cancellation, and refund.
1️⃣4️⃣ Pricing
Price Inputs
- Resource type
- Time range
- Demand
- Seasonality
- Discounts
- Taxes / fees
- Cancellation policy
Dynamic Pricing
Examples:
- Hotel holiday pricing
- Flight seat fare classes
- Event ticket demand pricing
- Restaurant peak-hour pricing
Price Snapshot
Store price at booking time.
Why?
- Price may change later
- Auditability
- Customer support
- Refund calculation
👉 Interview Answer
Pricing should be calculated at checkout and stored as a price snapshot on the booking.
This is important because prices may change later, but the customer’s confirmed booking should preserve the original price.
1️⃣5️⃣ Read Model and Caching
Read-heavy Data
- Availability search
- Resource details
- Pricing estimates
- Reviews
- Location search
Cache Strategy
- Cache resource metadata
- Cache availability snapshots with short TTL
- Cache popular search results
- Revalidate during hold creation
- Use event-driven invalidation
👉 Interview Answer
Booking search is read-heavy, so I would cache resource metadata and availability snapshots.
But cached availability is only an estimate.
The system must revalidate availability when the user creates a hold or confirms a booking.
1️⃣6️⃣ Scaling Patterns
Pattern 1: Separate Search and Booking Write Path
- Search = read-optimized, eventually consistent
- Booking write path = strongly controlled
Pattern 2: Shard by Resource or Region
hash(resource_id)
region/city partition
Pattern 3: Single-writer for Hot Slots
Serialize writes for popular resources.
Pattern 4: Event-driven Updates
booking confirmed
→ update availability read model
→ notify user
→ update analytics
Pattern 5: Expiration Worker
Automatically releases abandoned holds.
👉 Interview Answer
To scale booking systems, I would separate the read-heavy availability search path from the write-critical booking path.
I would shard by resource or region, use atomic updates for holds and confirmations, and use events to update read models asynchronously.
1️⃣7️⃣ Failure Handling
Common Failures
- Hold request timeout
- Duplicate confirmation request
- Payment succeeds but booking update fails
- Hold expires during payment
- Expiration worker delayed
- Cancellation refund fails
- Availability read model stale
- Hot slot contention
Strategies
- Idempotency keys
- Booking state machine
- Conditional updates
- Retry with backoff
- Outbox pattern for events
- Reconciliation jobs
- Release expired holds
- Manual review for unresolved states
👉 Interview Answer
Booking systems must handle partial failures carefully.
Hold, confirmation, cancellation, and refund operations should be idempotent.
The system should use conditional writes, explicit state transitions, retries, and reconciliation jobs to keep booking, payment, and availability consistent.
1️⃣8️⃣ Consistency Model
Stronger Consistency Needed For
- Hold creation
- Booking confirmation
- Cancellation
- Refund
- Payment
- Double-booking prevention
Eventual Consistency Acceptable For
- Search results
- Availability display
- Recommendation ranking
- Reviews
- Analytics
- Notifications
👉 Interview Answer
Booking systems require mixed consistency.
Search results and availability display can be eventually consistent.
But hold creation, booking confirmation, payment, cancellation, and refund require stronger correctness, because double booking is unacceptable.
1️⃣9️⃣ Observability
Key Metrics
- Availability search latency
- Hold creation success rate
- Hold expiration count
- Booking confirmation success rate
- Double-booking count
- Payment failure rate
- Cancellation rate
- Refund failure rate
- Hot slot contention
- Stale availability complaints
- Reconciliation mismatch count
👉 Interview Answer
I would monitor hold success rate, booking confirmation rate, double-booking incidents, payment failures, hold expiration count, hot slot contention, and reconciliation mismatches.
These metrics directly reflect booking correctness and user experience.
2️⃣0️⃣ End-to-End Flow
Search Flow
User searches resource
→ Query availability read model
→ Filter available options
→ Rank by price, location, rating
→ Return results
Booking Flow
User selects slot
→ Create temporary hold
→ User enters payment
→ Authorize/capture payment
→ Confirm booking
→ Move held capacity to confirmed capacity
→ Send confirmation
Cancellation Flow
User cancels booking
→ Check cancellation policy
→ Mark booking cancelled
→ Restore availability
→ Refund if eligible
→ Send cancellation notification
Key Insight
Booking System is not just storing reservations — it is a limited-resource allocation system with strong correctness requirements.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing a booking system, I think of it as a limited-resource allocation system.
The system manages resources over time, such as hotel rooms, restaurant tables, appointments, seats, or tickets.
The most important requirement is preventing double booking.
I would separate availability search from booking confirmation. Search can use cached or eventually consistent availability snapshots, because many users browse availability.
But when the user selects a slot, the system must create a temporary hold using an atomic conditional update.
The hold reduces available capacity for a short time while the user completes checkout.
If the user confirms, the system validates the hold, processes payment, creates a confirmed booking, moves capacity from held to confirmed, and emits booking events.
If the user abandons checkout, the hold expires and the capacity is released.
Booking should be modeled as a state machine so holds, confirmations, cancellations, payments, and refunds have valid transitions.
Payment integration should use idempotency keys, retries, and reconciliation jobs, because payment and booking state can diverge during partial failures.
To scale, I would shard by resource or region, use read-optimized availability models for search, and use strongly controlled writes for holds and confirmations.
The main trade-offs are consistency, availability, latency, user experience, and hot-resource contention.
Ultimately, the goal is to provide fast availability search while guaranteeing that confirmed bookings never exceed capacity.
⭐ Final Insight
Booking System 的核心不是简单记录预约, 而是对有限资源进行强正确性的 hold、confirm、cancel 和 refund 管理。
中文部分
🎯 Design Booking System
1️⃣ 核心框架
在设计 Booking System 时,我通常从以下几个方面分析:
- Resource 和 availability model
- Search availability
- Temporary hold / reservation
- Booking confirmation
- Payment integration
- Cancellation and refund
- Double-booking prevention
- Consistency、scaling 和 failure handling
2️⃣ 核心需求
功能需求
- 用户可以搜索可用资源
- 用户可以查看可用时间段
- 用户可以临时 hold 一个 slot
- 用户可以确认 booking
- 用户可以支付 booking
- 用户可以取消 booking
- 支持 booking expiration
- 支持 refunds
- 防止 double booking
- 支持 booking history
非功能需求
- Booking confirmation 需要强正确性
- Availability search 低延迟
- 高可用
- 可扩展读流量
- Booking APIs 幂等
- Booking 状态变化可审计
- Availability display 可以最终一致
- Final booking 必须强一致
👉 面试回答
Booking System 管理的是有限资源在时间维度上的分配。
最重要的挑战是防止 double booking。
我会将 availability search、temporary hold、 final confirmation、payment 和 cancellation 拆成清晰的状态转换流程。
3️⃣ 核心概念
Resource
Resource 可以是:
- Hotel room
- Restaurant table
- Doctor appointment slot
- Flight seat
- Event ticket
- Rental car
Time Slot
Booking 通常会在一个时间范围内占用资源。
resource_id = room_101
start_time = 2026-05-03 18:00
end_time = 2026-05-04 11:00
Availability
Availability 表示:
capacity - confirmed_bookings - active_holds > 0
👉 面试回答
我会围绕 resource 和 time slot 建模 booking。
一个 resource 有有限 capacity, availability 通过 total capacity 减去 confirmed bookings 和 active holds 来计算。
4️⃣ 主要 API
Search Availability
GET /api/availability?resourceType=hotel&location=NYC&start=2026-06-01&end=2026-06-03
Create Hold
POST /api/holds
Request:
{
"userId": "u123",
"resourceId": "room_type_deluxe",
"startTime": "2026-06-01T15:00:00Z",
"endTime": "2026-06-03T11:00:00Z",
"quantity": 1
}
Confirm Booking
POST /api/bookings
Request:
{
"holdId": "h789",
"paymentMethodId": "pm123"
}
Cancel Booking
POST /api/bookings/{bookingId}/cancel
Get Booking
GET /api/bookings/{bookingId}
👉 面试回答
核心 API 包括 search availability、create hold、 confirm booking、cancel booking 和 get booking status。
Search 可以最终一致, 但 hold 和 confirm 必须强控制。
5️⃣ 数据模型
Resource Table
resource (
resource_id VARCHAR PRIMARY KEY,
resource_type VARCHAR,
name VARCHAR,
location VARCHAR,
capacity INT,
status VARCHAR,
metadata JSON,
created_at TIMESTAMP
)
Availability Slot Table
availability_slot (
resource_id VARCHAR,
slot_start TIMESTAMP,
slot_end TIMESTAMP,
total_capacity INT,
confirmed_count INT,
held_count INT,
version BIGINT,
updated_at TIMESTAMP,
PRIMARY KEY (resource_id, slot_start)
)
Hold Table
booking_hold (
hold_id VARCHAR PRIMARY KEY,
user_id VARCHAR,
resource_id VARCHAR,
start_time TIMESTAMP,
end_time TIMESTAMP,
quantity INT,
status VARCHAR, -- active, confirmed, released, expired
expires_at TIMESTAMP,
idempotency_key VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Booking Table
booking (
booking_id VARCHAR PRIMARY KEY,
hold_id VARCHAR,
user_id VARCHAR,
resource_id VARCHAR,
start_time TIMESTAMP,
end_time TIMESTAMP,
quantity INT,
status VARCHAR, -- confirmed, cancelled, completed
payment_status VARCHAR,
total_price DECIMAL,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Booking Event Table
booking_event (
event_id VARCHAR PRIMARY KEY,
booking_id VARCHAR,
event_type VARCHAR,
actor_id VARCHAR,
created_at TIMESTAMP,
metadata JSON
)
👉 面试回答
我会为 resources、availability slots、 temporary holds、confirmed bookings 和 booking events 建立独立表。
Event table 提供 audit trail, 用于 debugging、customer support、 reconciliation 和 dispute handling。
6️⃣ Availability Search
Search Flow
User searches dates/location
→ Availability service queries read model
→ Filter resources with available capacity
→ Rank results by price, distance, rating
→ Return available options
Why Read Model?
Availability search 是 read-heavy。
使用:
booking write model
→ availability events
→ read-optimized availability index/cache
Important Rule
Search result 不是最终保证。
最终保证发生在 hold 或 confirmation 阶段。
👉 面试回答
Availability search 可以使用读优化 cache 或 index。
它可能轻微 stale, 因为大量用户都在浏览。
系统必须在 create hold 或 confirm booking 时 重新验证 availability。
7️⃣ Hold / Temporary Reservation Flow
为什么需要 Hold?
用户需要时间完成 checkout。
如果没有 hold:
User sees available slot
→ spends 2 minutes entering payment
→ slot gets booked by someone else
Hold Flow
User selects slot
→ Booking service checks availability
→ Atomically increases held_count
→ Creates hold with expiration
→ Returns hold to user
Atomic Update
UPDATE availability_slot
SET held_count = held_count + 1,
version = version + 1
WHERE resource_id = 'room_type_deluxe'
AND slot_start = '2026-06-01'
AND total_capacity - confirmed_count - held_count >= 1;
👉 面试回答
我会使用 temporary holds 来保护 checkout 期间的资源。
创建 hold 时必须原子检查 available capacity, 并增加 held count。
如果用户没有完成 booking, hold 应该自动过期释放。
8️⃣ Booking Confirmation Flow
Flow
User confirms booking
→ Validate hold is active and not expired
→ Authorize/capture payment
→ Convert hold to confirmed booking
→ Move held_count to confirmed_count
→ Mark hold confirmed
→ Emit booking confirmed event
Count Update
held_count = held_count - quantity
confirmed_count = confirmed_count + quantity
Important Rule
Booking confirmation 必须幂等。
如果用户重试确认, 系统不能创建重复 bookings。
👉 面试回答
Booking confirmation 会将 active hold 转换成 confirmed booking。
系统必须验证 hold 仍然 active, 处理 payment, 更新 availability, 创建 booking record, 并将 hold 标记为 confirmed。
这个操作必须幂等, 避免重复 booking。
9️⃣ Double-booking Prevention
Main Risk
多个用户同时尝试预订同一个 resource 和 time slot。
Techniques
1. Conditional Update
WHERE available_capacity >= quantity
2. Optimistic Locking
使用 version column。
update where version = old_version
3. Row-level Locking
在 hold / confirm 时锁住 resource-slot row。
4. Single-writer per Resource Slot
将同一个 resource slot 的写入路由到同一 partition。
5. Queue for High-demand Events
对非常热门 slots 串行化 booking requests。
👉 面试回答
Double-booking prevention 依赖 atomic updates。
我会使用 conditional writes、optimistic locking 或 row-level locking, 确保 confirmed 加 held quantity 不会超过 capacity。
对于热门活动, queue-based single-writer model 可以串行化 bookings。
🔟 Payment Integration
Payment Strategy
常见方式:
Hold first
→ Authorize payment
→ Confirm booking
→ Capture payment
Failure Cases
- Hold succeeds, payment fails → release hold
- Payment succeeds, booking update fails → retry / reconcile
- Booking cancelled → refund according to policy
- Payment timeout → check provider / reconcile
Saga Pattern
create hold
authorize payment
confirm booking
capture payment
Compensation:
release hold
void authorization
refund payment
👉 面试回答
Booking 和 payment 应该用 saga 协调。
如果 payment 失败, hold 应该被 release。
如果 payment 成功但 booking confirmation 失败, 系统应该 retry 或 reconciliation, 因为 payment correctness 和 booking correctness 必须保持一致。
1️⃣1️⃣ Hold Expiration
为什么需要?
用户可能放弃 checkout。
如果没有 expiration, 资源会被永久锁住。
Expiration Flow
Hold expires
→ Expiration worker scans active holds
→ Release hold
→ Decrease held_count
→ Mark hold expired
→ Emit hold expired event
Implementation Options
- Background scanner
- Delay queue
- TTL-based scheduler
- Time-wheel scheduler
👉 面试回答
Holds 必须有过期时间。
如果用户没有及时确认, background worker 或 delayed queue 会释放 hold, 减少 held count, 让 slot 重新可用。
1️⃣2️⃣ Cancellation and Refund
Cancellation Flow
User cancels booking
→ Check cancellation policy
→ Mark booking cancelled
→ Decrease confirmed_count
→ Process refund if eligible
→ Emit booking cancelled event
Cancellation Policies
示例:
- 24 小时前免费取消
- 截止时间后部分退款
- 开始时间后不可退款
- Provider-specific policy
👉 面试回答
Cancellation 应该通过 booking state machine 执行。
系统检查 cancellation policy, 更新 booking state, 如果适用则恢复 availability, 并根据 policy 触发 refund。
Cancellation 和 refund 都应该幂等。
1️⃣3️⃣ Booking State Machine
Common States
HOLD_CREATED
HOLD_ACTIVE
HOLD_EXPIRED
BOOKING_CONFIRMED
PAYMENT_AUTHORIZED
PAYMENT_CAPTURED
BOOKING_CANCELLED
BOOKING_COMPLETED
REFUNDED
Why State Machine?
- 防止非法状态转换
- 支持 retries
- 协调 payment
- 支持 cancellation rules
- 提升 auditability
👉 面试回答
我会将 booking 建模成 state machine。
这样可以让状态转换清晰, 防止非法状态, 并协调 holds、payments、confirmations、 cancellations 和 refunds。
1️⃣4️⃣ Pricing
Price Inputs
- Resource type
- Time range
- Demand
- Seasonality
- Discounts
- Taxes / fees
- Cancellation policy
Dynamic Pricing
示例:
- Hotel holiday pricing
- Flight seat fare classes
- Event ticket demand pricing
- Restaurant peak-hour pricing
Price Snapshot
在 booking 时保存价格。
原因:
- 价格之后可能变化
- 可审计
- Customer support
- Refund calculation
👉 面试回答
Pricing 应该在 checkout 时计算, 并作为 price snapshot 存储到 booking 上。
这很重要, 因为价格可能之后改变, 但用户确认的 booking 应该保留当时价格。
1️⃣5️⃣ Read Model and Caching
Read-heavy Data
- Availability search
- Resource details
- Pricing estimates
- Reviews
- Location search
Cache Strategy
- Cache resource metadata
- Cache availability snapshots with short TTL
- Cache popular search results
- Revalidate during hold creation
- Use event-driven invalidation
👉 面试回答
Booking search 是 read-heavy, 所以我会缓存 resource metadata 和 availability snapshots。
但 cached availability 只是估算。
系统必须在用户 create hold 或 confirm booking 时 重新验证 availability。
1️⃣6️⃣ Scaling Patterns
Pattern 1: Separate Search and Booking Write Path
- Search = read-optimized, eventually consistent
- Booking write path = strongly controlled
Pattern 2: Shard by Resource or Region
hash(resource_id)
region/city partition
Pattern 3: Single-writer for Hot Slots
对热门资源串行化写入。
Pattern 4: Event-driven Updates
booking confirmed
→ update availability read model
→ notify user
→ update analytics
Pattern 5: Expiration Worker
自动释放 abandoned holds。
👉 面试回答
为了扩展 booking systems, 我会将 read-heavy availability search 和 write-critical booking path 分开。
我会按 resource 或 region 分片, 对 hold 和 confirmation 使用 atomic updates, 并使用 events 异步更新 read models。
1️⃣7️⃣ Failure Handling
Common Failures
- Hold request timeout
- Duplicate confirmation request
- Payment succeeds but booking update fails
- Hold expires during payment
- Expiration worker delayed
- Cancellation refund fails
- Availability read model stale
- Hot slot contention
Strategies
- Idempotency keys
- Booking state machine
- Conditional updates
- Retry with backoff
- Outbox pattern for events
- Reconciliation jobs
- Release expired holds
- Manual review for unresolved states
👉 面试回答
Booking system 必须谨慎处理 partial failures。
Hold、confirmation、cancellation 和 refund operations 都应该幂等。
系统应该使用 conditional writes、 explicit state transitions、retries 和 reconciliation jobs 来保持 booking、payment 和 availability 一致。
1️⃣8️⃣ Consistency Model
需要较强一致性的场景
- Hold creation
- Booking confirmation
- Cancellation
- Refund
- Payment
- Double-booking prevention
可以最终一致的场景
- Search results
- Availability display
- Recommendation ranking
- Reviews
- Analytics
- Notifications
👉 面试回答
Booking systems 需要 mixed consistency。
Search results 和 availability display 可以最终一致。
但 hold creation、booking confirmation、payment、 cancellation 和 refund 需要更强正确性, 因为 double booking 是不可接受的。
1️⃣9️⃣ Observability
Key Metrics
- Availability search latency
- Hold creation success rate
- Hold expiration count
- Booking confirmation success rate
- Double-booking count
- Payment failure rate
- Cancellation rate
- Refund failure rate
- Hot slot contention
- Stale availability complaints
- Reconciliation mismatch count
👉 面试回答
我会监控 hold success rate、booking confirmation rate、 double-booking incidents、payment failures、 hold expiration count、hot slot contention 和 reconciliation mismatches。
这些指标直接反映 booking correctness 和用户体验。
2️⃣0️⃣ End-to-End Flow
Search Flow
User searches resource
→ Query availability read model
→ Filter available options
→ Rank by price, location, rating
→ Return results
Booking Flow
User selects slot
→ Create temporary hold
→ User enters payment
→ Authorize/capture payment
→ Confirm booking
→ Move held capacity to confirmed capacity
→ Send confirmation
Cancellation Flow
User cancels booking
→ Check cancellation policy
→ Mark booking cancelled
→ Restore availability
→ Refund if eligible
→ Send cancellation notification
Key Insight
Booking System 不是简单存储 reservations, 而是有限资源分配系统,必须保证强正确性。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 Booking System 时, 我会把它看作一个 limited-resource allocation system。
系统管理的是随时间变化的有限资源, 例如 hotel rooms、restaurant tables、 appointments、seats 或 tickets。
最重要的需求是防止 double booking。
我会将 availability search 和 booking confirmation 分开。 Search 可以使用 cached 或最终一致的 availability snapshots, 因为大量用户都会浏览 availability。
但当用户选择一个 slot 时, 系统必须通过 atomic conditional update 创建 temporary hold。
Hold 会在用户 checkout 期间短时间减少可用 capacity。
如果用户确认, 系统会验证 hold, 处理 payment, 创建 confirmed booking, 将 capacity 从 held 转为 confirmed, 并发布 booking events。
如果用户放弃 checkout, hold 会过期, capacity 会被释放。
Booking 应该建模为 state machine, 让 holds、confirmations、cancellations、 payments 和 refunds 都有合法状态转换。
Payment integration 需要 idempotency keys、retries 和 reconciliation jobs, 因为 payment 和 booking state 可能在 partial failures 中发生不一致。
为了扩展, 我会按 resource 或 region 分片, 使用 read-optimized availability models 做 search, 并对 holds 和 confirmations 使用强控制写入。
核心权衡包括 consistency、availability、 latency、user experience 和 hot-resource contention。
最终目标是在提供快速 availability search 的同时, 保证 confirmed bookings 永远不会超过 capacity。
⭐ Final Insight
Booking System 的核心不是简单记录预约, 而是对有限资源进行强正确性的 hold、confirm、cancel 和 refund 管理。
Implement