System Design Deep Dive - 18 Design Inventory System

Post by ailswan May. 11, 2026

中文 ↓

🎯 Design Inventory System

1️⃣ Core Framework

When discussing Inventory System design, I frame it as:

  1. Inventory data model: SKU, location, quantity
  2. Core flows: stock in, reserve, commit, release
  3. Reservation and oversell prevention
  4. Order integration and payment integration
  5. Multi-warehouse / multi-store inventory
  6. Event-driven updates and audit trail
  7. Reconciliation and correction
  8. Trade-offs: consistency vs availability vs latency

2️⃣ Core Requirements


Functional Requirements


Non-functional Requirements


👉 Interview Answer

An inventory system tracks how many units of each SKU are available at each location.

The most important challenge is preventing overselling, especially during checkout and high-demand events.

I would separate read-heavy inventory display from write-critical reservation and commit flows.


3️⃣ Core Concepts


SKU

A SKU represents a sellable item variant.

Example:

product = T-shirt
SKU = red, size M

Location

Inventory can exist at:


Inventory States

Common quantity fields:

on_hand
reserved
available
sold
damaged
returned

Formula:

available = on_hand - reserved - unavailable

👉 Interview Answer

I would model inventory at the SKU-location level.

The same product may have multiple SKUs, and each SKU may have inventory in multiple warehouses or stores.

Available inventory is usually derived from on-hand quantity minus reserved or unavailable quantity.


4️⃣ Main APIs


Get Inventory

GET /api/inventory?skuId=sku123&locationId=wh1

Reserve Inventory

POST /api/inventory/reservations

Request:

{
  "orderId": "o123",
  "skuId": "sku123",
  "locationId": "wh1",
  "quantity": 2
}

Commit Inventory

POST /api/inventory/reservations/{reservationId}/commit

Release Inventory

POST /api/inventory/reservations/{reservationId}/release

Adjust Inventory

POST /api/inventory/adjustments

Request:

{
  "skuId": "sku123",
  "locationId": "wh1",
  "delta": 10,
  "reason": "stock_received"
}

👉 Interview Answer

The core APIs are get inventory, reserve inventory, commit inventory, release inventory, and adjust inventory.

Reservation, commit, and release APIs must be idempotent, because order and payment systems may retry calls.


5️⃣ Data Model


Inventory Balance Table

inventory_balance (
  sku_id VARCHAR,
  location_id VARCHAR,
  on_hand INT,
  reserved INT,
  unavailable INT,
  version BIGINT,
  updated_at TIMESTAMP,
  PRIMARY KEY (sku_id, location_id)
)

Inventory Reservation Table

inventory_reservation (
  reservation_id VARCHAR PRIMARY KEY,
  order_id VARCHAR,
  sku_id VARCHAR,
  location_id VARCHAR,
  quantity INT,
  status VARCHAR, -- reserved, committed, released, expired
  expires_at TIMESTAMP,
  idempotency_key VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Inventory Event Table

inventory_event (
  event_id VARCHAR PRIMARY KEY,
  sku_id VARCHAR,
  location_id VARCHAR,
  event_type VARCHAR,
  quantity_delta INT,
  reason VARCHAR,
  reference_id VARCHAR,
  created_at TIMESTAMP,
  metadata JSON
)

Inventory Snapshot Table

inventory_snapshot (
  sku_id VARCHAR,
  location_id VARCHAR,
  available INT,
  updated_at TIMESTAMP,
  PRIMARY KEY (sku_id, location_id)
)

👉 Interview Answer

I would maintain an inventory balance table for current quantities, a reservation table for checkout holds, and an event table for audit history.

The event table is important because inventory changes must be explainable, debuggable, and reconcilable.


6️⃣ Reservation Flow


Why Reservation?

During checkout, we do not want to immediately mark inventory as sold.

Instead:

reserve now
commit after payment/order success
release if cancelled or expired

Reservation Flow

User checks out
→ Order service requests reservation
→ Inventory service checks available quantity
→ If enough stock, increase reserved
→ Create reservation record with TTL
→ Return success

Atomic Condition

Use conditional update:

UPDATE inventory_balance
SET reserved = reserved + 2,
    version = version + 1
WHERE sku_id = 'sku123'
AND location_id = 'wh1'
AND on_hand - reserved - unavailable >= 2;

👉 Interview Answer

I would use a reservation model.

During checkout, the inventory service atomically checks whether enough stock is available and increases the reserved quantity.

This prevents overselling while payment and order confirmation are still in progress.


7️⃣ Commit and Release Flow


Commit Flow

After order/payment success:

Reservation reserved
→ Commit reservation
→ Decrease on_hand
→ Decrease reserved
→ Mark reservation committed
→ Emit inventory committed event

Example:

on_hand = on_hand - quantity
reserved = reserved - quantity

Release Flow

If order is cancelled or payment fails:

Reservation reserved
→ Release reservation
→ Decrease reserved
→ Mark reservation released
→ Emit inventory released event

Expiration Flow

If reservation expires:

Reservation expires
→ Background worker releases it
→ Reserved stock becomes available again

👉 Interview Answer

After payment succeeds, the reservation should be committed, which reduces both on-hand and reserved inventory.

If payment fails or the order is cancelled, the reservation should be released, which decreases reserved inventory and makes the stock available again.

Reservations should have expiration times so abandoned checkouts do not hold inventory forever.


8️⃣ Oversell Prevention


Main Risk

Multiple customers try to buy the same SKU at the same time.


Techniques

1. Conditional Update

WHERE available >= requested_quantity

2. Optimistic Locking

Use version column:

read version
update where version = old_version

3. Row-level Locking

Lock SKU-location row during update.


4. Single-writer per SKU Partition

Route all writes for one SKU to one partition/actor.


5. Reservation Queue for Flash Sales

Serialize requests for extremely hot SKUs.


👉 Interview Answer

To prevent overselling, the critical operation is the atomic reservation.

I would use conditional updates, optimistic locking, or single-writer partitioning to ensure reserved quantity never exceeds available stock.

For flash sales, a queue-based reservation system can protect hot SKUs.


9️⃣ Multi-location Inventory


Why Multi-location Matters

The same SKU may exist in many places:

SKU123:
- warehouse A: 100
- warehouse B: 50
- store C: 5

Location Selection

Choose fulfillment location based on:


Flow

Order request
→ Find eligible locations
→ Check available inventory
→ Select best fulfillment location
→ Reserve inventory at that location

👉 Interview Answer

In a real system, inventory is tracked by SKU and location.

When an order is placed, the system should choose the best fulfillment location based on stock availability, distance, shipping cost, and delivery promise.

The reservation should happen at the selected location.


🔟 Read Model and Caching


Read-heavy Use Cases


Why Not Query Strong Store Every Time?

Because browsing traffic is much higher than checkout traffic.


Strategy

Use a read-optimized inventory snapshot:

Inventory write model
→ events
→ read model / cache
→ product pages

Cache Rules


👉 Interview Answer

I would separate inventory reads from inventory writes.

Product browsing can use cached or eventually consistent inventory snapshots, because slight staleness is acceptable.

But checkout must call the inventory service to perform an atomic reservation against the authoritative inventory balance.


1️⃣1️⃣ Event-driven Inventory Updates


Events

Examples:

inventory_reserved
inventory_committed
inventory_released
inventory_adjusted
inventory_received
inventory_damaged
inventory_returned

Event Consumers


👉 Interview Answer

Inventory changes should emit events.

These events update read models, product availability cache, search index, analytics, low-stock alerts, and warehouse systems.

This keeps the write path focused on correctness while downstream systems update asynchronously.


1️⃣2️⃣ Returns, Damaged Goods, and Adjustments


Return Flow

Customer returns item
→ Warehouse receives item
→ Inspect condition
→ If sellable, increase on_hand
→ If damaged, increase unavailable
→ Emit inventory_returned event

Adjustment Reasons


👉 Interview Answer

Not all inventory changes come from orders.

Returns, damaged goods, warehouse receiving, manual corrections, and cycle counts also change inventory.

Every adjustment should include a reason, reference ID, and audit event.


1️⃣3️⃣ Reconciliation


Why Needed?

System inventory may differ from physical inventory.

Causes:


Reconciliation Flow

Physical count / warehouse report
→ Compare with system inventory
→ Find discrepancy
→ Create adjustment event
→ Update inventory balance
→ Generate audit report

Important Principle

Never silently overwrite inventory.

Always create adjustment events.


👉 Interview Answer

Reconciliation is necessary because physical inventory can diverge from system inventory.

I would compare warehouse counts with system balances, create adjustment events for discrepancies, and keep a full audit trail.

Inventory corrections should never be silent overwrites.


1️⃣4️⃣ Flash Sale / High-demand SKU Handling


Problem

A very popular item receives huge concurrent demand.

Risks:


Strategies


👉 Interview Answer

Flash sales create hot SKU problems.

For extremely high-demand SKUs, I would avoid letting every request directly hit the inventory database.

Instead, I would use a queue, token bucket, or single-writer partition to serialize reservations and protect consistency.


1️⃣5️⃣ Integration With Order and Payment


Normal Flow

User checks out
→ Reserve inventory
→ Authorize payment
→ Create order
→ Commit inventory after order confirmed

Alternative Flow

Authorize payment
→ Reserve inventory
→ Create order
→ Capture payment

Failure Cases


Saga Pattern

Use saga to coordinate:

reserve inventory
authorize payment
create order
commit inventory
capture payment

Each step has a compensation action.


👉 Interview Answer

Inventory, order, and payment should be coordinated carefully.

I would use a saga pattern, where each step has a compensating action.

For example, if payment fails after inventory reservation, the system releases the reservation.

If inventory reservation fails, the system should not proceed with payment capture.


1️⃣6️⃣ Consistency Model


Stronger Consistency Needed For


Eventual Consistency Acceptable For


👉 Interview Answer

Inventory requires mixed consistency.

Checkout reservation and commit need strong correctness to prevent overselling.

Product pages and search results can use eventually consistent snapshots, but they must revalidate inventory during checkout.


1️⃣7️⃣ Scaling Patterns


Pattern 1: Separate Write Model and Read Model


Pattern 2: Shard by SKU or SKU-location

hash(sku_id + location_id)

Pattern 3: Event-driven Propagation

Inventory changes publish events to downstream consumers.


Pattern 4: Single-writer for Hot SKU

Serialize writes for high-demand items.


Pattern 5: Reservation Expiration Worker

Automatically releases expired reservations.


👉 Interview Answer

To scale inventory, I would shard by SKU-location, separate authoritative writes from cached read models, and use events to update downstream systems.

For hot SKUs, a single-writer or queue-based reservation model can prevent contention and overselling.


1️⃣8️⃣ Failure Handling


Common Failures


Strategies


👉 Interview Answer

Inventory systems must handle retries and partial failures.

Reservation, commit, and release should be idempotent.

Inventory updates should use conditional writes, and all state changes should emit audit events.

Reconciliation jobs are needed to correct mismatches over time.


1️⃣9️⃣ Observability


Key Metrics


👉 Interview Answer

I would monitor reservation success rate, oversell count, checkout inventory latency, expired reservations, commit and release failures, hot SKU contention, and reconciliation mismatches.

These metrics directly show whether inventory correctness and checkout reliability are healthy.


2️⃣0️⃣ End-to-End Flow


Checkout Flow

User checks out
→ Inventory service reserves SKU-location quantity
→ Payment service authorizes payment
→ Order service creates order
→ Inventory service commits reservation
→ Order confirmed

Cancellation Flow

User cancels order
→ Order state updated
→ Inventory reservation released
→ Payment authorization voided or refunded
→ Events emitted

Reconciliation Flow

Warehouse physical count
→ Compare with system balance
→ Create adjustment event
→ Update inventory balance
→ Audit report generated

Key Insight

Inventory System is not just a quantity table — it is a correctness-critical reservation and reconciliation system.


🧠 Staff-Level Answer (Final)


👉 Interview Answer (Full Version)

When designing an inventory system, I think of it as a correctness-critical system that tracks available stock by SKU and location.

The most important goal is to prevent overselling, especially during checkout and high-demand events.

I would model inventory using an authoritative inventory balance table, a reservation table, and an append-only inventory event table.

During checkout, the system should create a reservation instead of immediately marking stock as sold. The reservation atomically checks available quantity and increases reserved inventory.

After payment and order confirmation, the reservation is committed, which decreases both on-hand and reserved inventory.

If payment fails, the order is cancelled, or the reservation expires, the reservation is released and the stock becomes available again.

To prevent overselling, I would use conditional updates, optimistic locking, row-level locking, or single-writer partitioning for hot SKUs.

For browsing, I would use eventually consistent inventory snapshots or caches, because product pages and search results are read-heavy and can tolerate slight staleness.

But checkout must always revalidate and reserve against the authoritative inventory store.

Inventory changes should emit events so downstream systems like search, product pages, analytics, low-stock alerts, and warehouse systems can update asynchronously.

Reconciliation is essential because physical inventory can diverge from system inventory. Corrections should be made through adjustment events, never silent overwrites.

The main trade-offs are consistency, availability, checkout latency, contention on hot SKUs, and operational complexity.

Ultimately, the goal is to provide fast inventory visibility for users while maintaining strong correctness for reservation, commit, release, and reconciliation.


⭐ Final Insight

Inventory System 的核心不是简单的库存数量表, 而是一个防止 oversell、支持 reservation、commit、release 和 reconciliation 的强正确性系统。



中文部分


🎯 Design Inventory System


1️⃣ 核心框架

在设计 Inventory System 时,我通常从以下几个方面来分析:

  1. Inventory 数据模型:SKU、location、quantity
  2. 核心流程:stock in、reserve、commit、release
  3. Reservation 和 oversell 防护
  4. Order 和 payment integration
  5. Multi-warehouse / multi-store inventory
  6. Event-driven updates 和 audit trail
  7. Reconciliation 和 correction
  8. 核心权衡:consistency vs availability vs latency

2️⃣ 核心需求


功能需求


非功能需求


👉 面试回答

Inventory System 用来追踪每个 SKU 在每个 location 有多少可用库存。

最重要的挑战是防止 overselling, 尤其是在 checkout 和高需求活动期间。

我会将 read-heavy 的库存展示 和 write-critical 的 reservation / commit 流程分开。


3️⃣ 核心概念


SKU

SKU 表示一个可销售的商品变体。

示例:

product = T-shirt
SKU = red, size M

Location

库存可以存在于:


Inventory States

常见 quantity 字段:

on_hand
reserved
available
sold
damaged
returned

公式:

available = on_hand - reserved - unavailable

👉 面试回答

我会在 SKU-location 级别建模库存。

同一个 product 可能有多个 SKUs, 每个 SKU 也可能存在于多个 warehouses 或 stores。

Available inventory 通常由 on-hand quantity 减去 reserved 或 unavailable quantity 得出。


4️⃣ 主要 API


Get Inventory

GET /api/inventory?skuId=sku123&locationId=wh1

Reserve Inventory

POST /api/inventory/reservations

Request:

{
  "orderId": "o123",
  "skuId": "sku123",
  "locationId": "wh1",
  "quantity": 2
}

Commit Inventory

POST /api/inventory/reservations/{reservationId}/commit

Release Inventory

POST /api/inventory/reservations/{reservationId}/release

Adjust Inventory

POST /api/inventory/adjustments

Request:

{
  "skuId": "sku123",
  "locationId": "wh1",
  "delta": 10,
  "reason": "stock_received"
}

👉 面试回答

核心 API 包括 get inventory、reserve inventory、 commit inventory、release inventory 和 adjust inventory。

Reservation、commit 和 release APIs 必须幂等, 因为 order 和 payment systems 都可能重试调用。


5️⃣ 数据模型


Inventory Balance Table

inventory_balance (
  sku_id VARCHAR,
  location_id VARCHAR,
  on_hand INT,
  reserved INT,
  unavailable INT,
  version BIGINT,
  updated_at TIMESTAMP,
  PRIMARY KEY (sku_id, location_id)
)

Inventory Reservation Table

inventory_reservation (
  reservation_id VARCHAR PRIMARY KEY,
  order_id VARCHAR,
  sku_id VARCHAR,
  location_id VARCHAR,
  quantity INT,
  status VARCHAR, -- reserved, committed, released, expired
  expires_at TIMESTAMP,
  idempotency_key VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Inventory Event Table

inventory_event (
  event_id VARCHAR PRIMARY KEY,
  sku_id VARCHAR,
  location_id VARCHAR,
  event_type VARCHAR,
  quantity_delta INT,
  reason VARCHAR,
  reference_id VARCHAR,
  created_at TIMESTAMP,
  metadata JSON
)

Inventory Snapshot Table

inventory_snapshot (
  sku_id VARCHAR,
  location_id VARCHAR,
  available INT,
  updated_at TIMESTAMP,
  PRIMARY KEY (sku_id, location_id)
)

👉 面试回答

我会维护 inventory balance table 来记录当前库存数量, reservation table 来记录 checkout hold, event table 来记录 audit history。

Event table 很重要, 因为库存变化必须可以解释、debug 和 reconciliation。


6️⃣ Reservation Flow


为什么需要 Reservation?

在 checkout 时, 我们不应该立刻把库存标记为 sold。

而是:

reserve now
commit after payment/order success
release if cancelled or expired

Reservation Flow

User checks out
→ Order service requests reservation
→ Inventory service checks available quantity
→ If enough stock, increase reserved
→ Create reservation record with TTL
→ Return success

Atomic Condition

使用 conditional update:

UPDATE inventory_balance
SET reserved = reserved + 2,
    version = version + 1
WHERE sku_id = 'sku123'
AND location_id = 'wh1'
AND on_hand - reserved - unavailable >= 2;

👉 面试回答

我会使用 reservation model。

在 checkout 期间, inventory service 会原子检查是否有足够库存, 并增加 reserved quantity。

这样可以在 payment 和 order confirmation 仍在进行时, 防止 overselling。


7️⃣ Commit and Release Flow


Commit Flow

Order / payment 成功后:

Reservation reserved
→ Commit reservation
→ Decrease on_hand
→ Decrease reserved
→ Mark reservation committed
→ Emit inventory committed event

示例:

on_hand = on_hand - quantity
reserved = reserved - quantity

Release Flow

如果 order cancelled 或 payment failed:

Reservation reserved
→ Release reservation
→ Decrease reserved
→ Mark reservation released
→ Emit inventory released event

Expiration Flow

如果 reservation 过期:

Reservation expires
→ Background worker releases it
→ Reserved stock becomes available again

👉 面试回答

Payment 成功后, reservation 应该被 commit, 这会同时减少 on-hand 和 reserved inventory。

如果 payment 失败或 order 被取消, reservation 应该被 release, 减少 reserved inventory, 让库存重新变成可用。

Reservations 应该有过期时间, 避免 abandoned checkout 永久占用库存。


8️⃣ Oversell Prevention


Main Risk

多个 customers 同时购买同一个 SKU。


Techniques

1. Conditional Update

WHERE available >= requested_quantity

2. Optimistic Locking

使用 version column:

read version
update where version = old_version

3. Row-level Locking

更新时锁住 SKU-location row。


4. Single-writer per SKU Partition

同一个 SKU 的所有写入都路由到同一个 partition / actor。


5. Reservation Queue for Flash Sales

对极热门 SKU 串行化 reservation requests。


👉 面试回答

为了防止 overselling, 最关键的操作是 atomic reservation。

我会使用 conditional updates、optimistic locking 或 single-writer partitioning, 保证 reserved quantity 不会超过 available stock。

对于 flash sale, queue-based reservation system 可以保护 hot SKUs。


9️⃣ Multi-location Inventory


为什么 Multi-location 重要?

同一个 SKU 可能存在于多个地点:

SKU123:
- warehouse A: 100
- warehouse B: 50
- store C: 5

Location Selection

选择 fulfillment location 时考虑:


Flow

Order request
→ Find eligible locations
→ Check available inventory
→ Select best fulfillment location
→ Reserve inventory at that location

👉 面试回答

在真实系统中, inventory 通常按 SKU 和 location 追踪。

当订单创建时, 系统应该根据 stock availability、distance、 shipping cost 和 delivery promise 选择最佳 fulfillment location。

Reservation 应该发生在被选中的 location 上。


🔟 Read Model and Caching


Read-heavy Use Cases


Why Not Query Strong Store Every Time?

因为 browsing traffic 远高于 checkout traffic。


Strategy

使用 read-optimized inventory snapshot:

Inventory write model
→ events
→ read model / cache
→ product pages

Cache Rules


👉 面试回答

我会将 inventory reads 和 writes 分开。

商品浏览可以使用 cached 或最终一致的 inventory snapshots, 因为轻微 stale 是可以接受的。

但 checkout 必须调用 inventory service, 在 authoritative inventory balance 上执行 atomic reservation。


1️⃣1️⃣ Event-driven Inventory Updates


Events

示例:

inventory_reserved
inventory_committed
inventory_released
inventory_adjusted
inventory_received
inventory_damaged
inventory_returned

Event Consumers


👉 面试回答

Inventory changes 应该发布 events。

这些 events 可以更新 read models、 product availability cache、search index、 analytics、low-stock alerts 和 warehouse systems。

这样 write path 可以专注于 correctness, downstream systems 异步更新。


1️⃣2️⃣ Returns, Damaged Goods, and Adjustments


Return Flow

Customer returns item
→ Warehouse receives item
→ Inspect condition
→ If sellable, increase on_hand
→ If damaged, increase unavailable
→ Emit inventory_returned event

Adjustment Reasons


👉 面试回答

并不是所有库存变化都来自订单。

Returns、damaged goods、warehouse receiving、 manual corrections 和 cycle counts 也会改变库存。

每次 adjustment 都应该包含 reason、reference ID 和 audit event。


1️⃣3️⃣ Reconciliation


为什么需要?

系统库存可能和实际物理库存不同。

原因:


Reconciliation Flow

Physical count / warehouse report
→ Compare with system inventory
→ Find discrepancy
→ Create adjustment event
→ Update inventory balance
→ Generate audit report

Important Principle

不要静默覆盖库存。

必须创建 adjustment events。


👉 面试回答

Reconciliation 是必要的, 因为 physical inventory 可能和 system inventory 不一致。

我会将 warehouse count 和 system balance 对比, 对差异创建 adjustment events, 并保留完整 audit trail。

Inventory corrections 不应该是 silent overwrites。


1️⃣4️⃣ Flash Sale / High-demand SKU Handling


Problem

一个热门商品收到大量并发需求。

风险:


Strategies


👉 面试回答

Flash sale 会造成 hot SKU problem。

对极高需求的 SKU, 我不会让所有请求直接打到 inventory database。

我会使用 queue、token bucket 或 single-writer partition 来串行化 reservations 并保护一致性。


1️⃣5️⃣ Integration With Order and Payment


Normal Flow

User checks out
→ Reserve inventory
→ Authorize payment
→ Create order
→ Commit inventory after order confirmed

Alternative Flow

Authorize payment
→ Reserve inventory
→ Create order
→ Capture payment

Failure Cases


Saga Pattern

使用 saga 协调:

reserve inventory
authorize payment
create order
commit inventory
capture payment

每一步都有 compensation action。


👉 面试回答

Inventory、order 和 payment 需要谨慎协调。

我会使用 saga pattern, 每一步都有对应的补偿动作。

例如,如果 inventory reservation 后 payment 失败, 系统需要 release reservation。

如果 inventory reservation 失败, 系统不应该继续 capture payment。


1️⃣6️⃣ Consistency Model


需要较强一致性的场景


可以最终一致的场景


👉 面试回答

Inventory 需要 mixed consistency。

Checkout reservation 和 commit 需要强正确性, 防止 overselling。

Product pages 和 search results 可以使用最终一致 snapshots, 但 checkout 时必须重新验证 inventory。


1️⃣7️⃣ Scaling Patterns


Pattern 1: Separate Write Model and Read Model


Pattern 2: Shard by SKU or SKU-location

hash(sku_id + location_id)

Pattern 3: Event-driven Propagation

Inventory changes publish events to downstream consumers.


Pattern 4: Single-writer for Hot SKU

对热门 SKU 串行化写入。


Pattern 5: Reservation Expiration Worker

自动释放过期 reservations。


👉 面试回答

为了扩展 inventory, 我会按 SKU-location 分片, 将 authoritative writes 和 cached read models 分开, 并使用 events 更新 downstream systems。

对 hot SKUs, single-writer 或 queue-based reservation model 可以减少竞争并防止 overselling。


1️⃣8️⃣ Failure Handling


Common Failures


Strategies


👉 面试回答

Inventory system 必须处理 retries 和 partial failures。

Reservation、commit 和 release 都应该幂等。

Inventory updates 应该使用 conditional writes, 所有状态变化都应该产生 audit events。

Reconciliation jobs 用于长期修复 mismatches。


1️⃣9️⃣ Observability


Key Metrics


👉 面试回答

我会监控 reservation success rate、oversell count、 checkout inventory latency、expired reservations、 commit / release failures、hot SKU contention 和 reconciliation mismatches。

这些指标可以直接反映 inventory correctness 和 checkout reliability 是否健康。


2️⃣0️⃣ End-to-End Flow


Checkout Flow

User checks out
→ Inventory service reserves SKU-location quantity
→ Payment service authorizes payment
→ Order service creates order
→ Inventory service commits reservation
→ Order confirmed

Cancellation Flow

User cancels order
→ Order state updated
→ Inventory reservation released
→ Payment authorization voided or refunded
→ Events emitted

Reconciliation Flow

Warehouse physical count
→ Compare with system balance
→ Create adjustment event
→ Update inventory balance
→ Audit report generated

Key Insight

Inventory System 不是简单的 quantity table, 而是 correctness-critical reservation and reconciliation system。


🧠 Staff-Level Answer(最终版)


👉 面试回答(完整背诵版)

在设计 Inventory System 时, 我会把它看作一个 correctness-critical system, 用来追踪每个 SKU 在每个 location 的可用库存。

最重要目标是防止 overselling, 特别是在 checkout 和高并发抢购场景。

我会使用 authoritative inventory balance table、 reservation table 和 append-only inventory event table 来建模。

在 checkout 期间, 系统不应该立刻将库存标记为 sold, 而是先创建 reservation。 Reservation 会原子检查可用数量, 并增加 reserved inventory。

Payment 和 order confirmation 成功后, reservation 会被 commit, 这会同时减少 on-hand 和 reserved inventory。

如果 payment 失败、order 被取消, 或 reservation 过期, reservation 会被 release, 库存重新变为可用。

为了防止 overselling, 我会使用 conditional updates、optimistic locking、 row-level locking, 或对 hot SKUs 使用 single-writer partitioning。

对于 browsing, 我会使用最终一致的 inventory snapshots 或 cache, 因为商品页和搜索结果是 read-heavy, 可以容忍轻微 stale。

但 checkout 必须始终基于 authoritative inventory store 重新验证并 reserve inventory。

Inventory changes 应该发布 events, 让 search、product pages、analytics、 low-stock alerts 和 warehouse systems 可以异步更新。

Reconciliation 非常关键, 因为 physical inventory 可能和 system inventory 不一致。 Correction 应该通过 adjustment events 完成, 不能 silent overwrite。

核心权衡包括 consistency、availability、 checkout latency、hot SKU contention 和 operational complexity。

最终目标是在给用户提供快速库存可见性的同时, 对 reservation、commit、release 和 reconciliation 保持强正确性。


⭐ Final Insight

Inventory System 的核心不是简单的库存数量表, 而是一个防止 oversell、支持 reservation、commit、release 和 reconciliation 的强正确性系统。

Implement