🎯 Design Rate Limiter
1️⃣ Core Framework
When discussing Rate Limiter design, I frame it as:
- Core purpose and requirements
- Rate limiting algorithms
- Key design dimensions: user, IP, API, tenant
- Storage and counter strategy
- Distributed rate limiting
- Handling burst traffic
- Failure handling and fallback
- Trade-offs: accuracy vs latency vs availability
2️⃣ Core Requirements
Functional Requirements
- Limit number of requests per user / IP / API key
- Support different limits for different APIs
- Support different limits for different user tiers
- Return proper error when limit is exceeded
- Support burst control
- Support distributed deployment
Non-functional Requirements
- Very low latency
- High availability
- Scalable to high QPS
- Accurate enough under distributed traffic
- Configurable limits
- Minimal impact on normal request path
👉 Interview Answer
A rate limiter controls how many requests a client can make within a given time window.
It protects backend services from abuse, traffic spikes, accidental client bugs, and unfair resource usage.
The main challenge is enforcing limits accurately while keeping latency very low and availability high.
3️⃣ Main APIs / Integration Points
Check Limit API
POST /api/rate-limit/check
Request:
{
"key": "user:u123",
"api": "/checkout",
"cost": 1
}
Response:
{
"allowed": true,
"remaining": 99,
"resetAt": "2026-05-02T10:01:00Z"
}
Gateway Integration
Most commonly, rate limiter is integrated at:
Client
→ API Gateway / Load Balancer
→ Rate Limiter
→ Backend Service
Response When Limited
HTTP/1.1 429 Too Many Requests
Retry-After: 30
👉 Interview Answer
I would usually place the rate limiter at the API gateway layer, because it can block abusive traffic before it reaches backend services.
When a request exceeds the limit, the system should return HTTP 429 with a Retry-After header.
4️⃣ Rate Limiting Algorithms
Algorithm 1: Fixed Window Counter
Example:
100 requests per minute
Implementation:
key = user_id + current_minute
counter++
Pros
- Simple
- Fast
- Memory efficient
Cons
- Boundary problem
- Allows burst around window edges
Example:
100 requests at 00:59
100 requests at 01:00
→ 200 requests in 2 seconds
👉 Interview Answer
Fixed window counter is simple and efficient, but it has a boundary problem.
A client can send requests at the end of one window and the beginning of the next window, causing a burst much larger than the intended rate.
Algorithm 2: Sliding Window Log
Store timestamps of each request.
user:u123 → [t1, t2, t3, ...]
On each request:
- Remove timestamps older than window
- Count remaining timestamps
- Allow if count < limit
Pros
- Very accurate
- No boundary burst problem
Cons
- High memory usage
- Expensive for high QPS users
👉 Interview Answer
Sliding window log is accurate because it tracks individual request timestamps.
However, it is expensive in memory and compute, especially for high-QPS clients.
Algorithm 3: Sliding Window Counter
Approximation using current and previous window.
Formula:
effective_count =
current_window_count +
previous_window_count * overlap_ratio
Pros
- More accurate than fixed window
- Cheaper than sliding log
- Good practical compromise
Cons
- Approximate
- Slightly more complex than fixed window
👉 Interview Answer
Sliding window counter is a practical compromise.
It avoids most fixed-window burst problems while using much less memory than sliding window log.
It is commonly used when we need reasonable accuracy at scale.
Algorithm 4: Token Bucket
Each key has a bucket.
bucket capacity = burst size
refill rate = allowed rate
Example:
capacity = 100 tokens
refill = 10 tokens / second
Each request consumes one token.
Pros
- Supports burst traffic
- Smooth average rate
- Widely used in production
Cons
- Requires tracking tokens and last refill time
- Slightly more complex
👉 Interview Answer
Token bucket is useful when we want to allow controlled bursts while enforcing a long-term average rate.
Each request consumes tokens, and tokens are refilled at a fixed rate.
If the bucket is empty, the request is rejected or delayed.
Algorithm 5: Leaky Bucket
Requests enter a queue and are processed at fixed rate.
Pros
- Smooths traffic
- Protects downstream systems
Cons
- Adds queueing latency
- Queue can overflow
- Less suitable when low latency is required
👉 Interview Answer
Leaky bucket smooths traffic by processing requests at a constant rate.
It is useful for protecting downstream systems, but it may add latency because requests can wait in a queue.
Recommended Algorithm
For most API rate limiters:
Token Bucket or Sliding Window Counter
Core Insight
Fixed window is simple, sliding window is more accurate, token bucket handles bursts better.
5️⃣ Rate Limit Key Design
Common Keys
user_id
api_key
ip_address
tenant_id
endpoint
device_id
Composite Keys
Examples:
user:u123:/checkout
ip:1.2.3.4:/login
tenant:t1:/api/orders
api_key:k123:/search
Why Key Design Matters
- Controls fairness
- Prevents abuse
- Supports per-endpoint limits
- Supports tenant-specific quotas
👉 Interview Answer
Rate limit key design is very important.
We may limit by user ID, IP address, API key, endpoint, tenant, or a combination of them.
For example, login APIs may be limited by IP and username, while paid APIs may be limited by tenant or API key.
6️⃣ Data Storage
In-memory Local Counter
gateway node memory
Pros
- Extremely fast
- No network call
Cons
- Not globally accurate
- Each node sees partial traffic
- Lost on restart
Redis Counter
Redis key → counter / token bucket state
Pros
- Centralized state
- Fast
- TTL support
- Atomic operations with Lua
Cons
- Network latency
- Redis can become bottleneck
- Redis failure affects limiter
Persistent Database
Usually not used for hot path.
Use Cases
- Config storage
- Audit logs
- Usage reporting
- Billing quotas
👉 Interview Answer
For the hot path, I would usually use Redis or local memory.
Redis gives centralized counters with atomic operations and TTL support, while local memory is faster but less accurate in distributed systems.
Persistent databases are better for configuration, auditing, and billing, not per-request rate limit checks.
7️⃣ Distributed Rate Limiting
The Challenge
In distributed systems, traffic is spread across many gateway nodes.
Client → Gateway A
Client → Gateway B
Client → Gateway C
Each node only sees part of the traffic.
Option 1: Centralized Redis
All gateways check Redis.
Gateway → Redis → decision
Pros
- More accurate
- Simple global limit
Cons
- Adds latency
- Redis hot keys
- Redis dependency
Option 2: Local Limiters
Each gateway enforces a fraction of the limit.
global limit = 1000/sec
10 gateways
each gateway allows 100/sec
Pros
- Very fast
- Highly available
Cons
- Less accurate
- Bad with uneven traffic distribution
Option 3: Hybrid
Use:
local limiter for fast protection
global Redis limiter for accuracy
👉 Interview Answer
In distributed systems, a single user’s traffic may hit multiple gateway nodes.
A centralized Redis limiter gives better global accuracy, but adds latency and dependency.
A local limiter is faster and more available, but less accurate.
In production, I would often use a hybrid approach: local limiting for fast protection, and Redis-based global limiting for stronger enforcement.
8️⃣ Redis + Lua Atomic Operation
Why Lua?
A rate limiter needs atomic operations:
read current count
increment count
set TTL
return allow/reject
Without atomicity, concurrent requests can exceed limit.
Example Logic
local current = redis.call("GET", key)
if current == false then
redis.call("SET", key, 1, "EX", window)
return 1
end
if tonumber(current) < limit then
redis.call("INCR", key)
return 1
else
return 0
end
👉 Interview Answer
I would use Redis Lua scripts to make the rate limit check atomic.
The script reads the counter, increments it, sets TTL if needed, and returns allow or reject in one atomic operation.
This prevents race conditions under high concurrency.
9️⃣ Handling Burst Traffic
Why Burst Matters
Some clients legitimately send bursts.
Examples:
- Page load triggers multiple API calls
- Mobile app reconnects
- Batch job starts
- User refreshes dashboard
Strategies
- Token bucket with burst capacity
- Separate short-term and long-term limits
- Per-endpoint limits
- Queue low-priority requests
- Return Retry-After
Example
Short-term: 20 requests / second
Long-term: 1000 requests / hour
👉 Interview Answer
I would handle bursts using token bucket or layered limits.
For example, a user may be allowed 20 requests per second but also capped at 1000 requests per hour.
This supports normal bursts while preventing long-term abuse.
🔟 Config Management
Config Examples
{
"endpoint": "/checkout",
"limit": 100,
"windowSeconds": 60,
"algorithm": "token_bucket",
"scope": "user_id",
"priority": "critical"
}
Requirements
- Dynamic config updates
- Per-user-tier limits
- Per-endpoint limits
- Emergency override
- Safe rollout
👉 Interview Answer
Rate limits should be configurable, because different APIs and user tiers have different traffic patterns.
I would store configs in a central config service and cache them in gateway nodes, with safe rollout and emergency override support.
1️⃣1️⃣ Response Behavior
When Allowed
Forward request to backend.
Add headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1714653600
When Limited
Return:
HTTP 429 Too Many Requests
Retry-After: 30
Hard Block vs Soft Limit
- Hard block: reject immediately
- Soft limit: allow but log / degrade
- Shadow mode: evaluate without blocking
👉 Interview Answer
When the request is allowed, the system forwards it to the backend and may return rate-limit headers.
When the limit is exceeded, it should return HTTP 429 with Retry-After.
Before enforcing new limits, I would often run them in shadow mode to understand impact.
1️⃣2️⃣ Failure Handling
Common Failures
- Redis unavailable
- Config service unavailable
- Gateway restart
- Hot key overload
- Clock skew
- Network timeout
Fail Open vs Fail Closed
| Mode | Meaning | Use Case |
|---|---|---|
| Fail open | Allow traffic when limiter fails | User-facing low-risk APIs |
| Fail closed | Reject traffic when limiter fails | Security-sensitive APIs |
Recommended
- Security APIs: fail closed or strict fallback
- Normal APIs: fail open with local emergency limit
- Use circuit breaker for Redis
- Cache last known config
👉 Interview Answer
Failure behavior depends on the API.
For security-sensitive APIs like login or payment, I may fail closed or use a strict local fallback.
For normal user-facing APIs, I may fail open to avoid blocking legitimate users.
I would also cache configs locally and use circuit breakers around Redis or external dependencies.
1️⃣3️⃣ Observability
Metrics
- Allowed request count
- Rejected request count
- Rate limit hit rate
- Redis latency
- Redis error rate
- Hot keys
- Config update latency
- False positive complaints
- Backend protection effectiveness
Logs
Track:
request_id
rate_limit_key
algorithm
limit
remaining
decision
reason
👉 Interview Answer
Observability is important because rate limiters directly affect user traffic.
I would track allowed and rejected requests, Redis latency, hot keys, limit hit rates, and the impact on backend errors.
Detailed logs help debug false positives and tune limits safely.
1️⃣4️⃣ Consistency Model
Stronger Consistency Needed For
- Security-sensitive limits
- Payment or login abuse prevention
- Billing quota enforcement
Eventual / Approximate Consistency Acceptable For
- General API traffic shaping
- Soft quotas
- Best-effort abuse prevention
- Internal service protection
👉 Interview Answer
Rate limiting usually does not require perfect consistency.
For most APIs, approximate limits are acceptable because the goal is protection and fairness.
But for security-sensitive or billing-related limits, stronger consistency and auditability are more important.
1️⃣5️⃣ End-to-End Flow
Request Flow
Client request
→ API Gateway
→ Build rate limit key
→ Load config
→ Check local limiter
→ Check global Redis limiter
→ Allow or reject
→ Forward to backend if allowed
Redis Limiter Flow
Gateway
→ Execute Redis Lua script
→ Atomically update counter/token state
→ Return allow/reject decision
Limited Flow
Request exceeds limit
→ Return HTTP 429
→ Include Retry-After header
→ Log decision
→ Emit metric
Key Insight
A rate limiter is not just a counter — it is a low-latency distributed control system.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing a rate limiter, I think of it as a low-latency distributed control system that protects backend services from abuse, traffic spikes, and unfair usage.
I would usually place it at the API gateway layer so abusive traffic can be blocked before reaching backend services.
The system should support different limits by user, IP, API key, tenant, endpoint, and user tier.
For algorithms, fixed window is simple but has boundary burst issues. Sliding window log is accurate but expensive. Sliding window counter is a good compromise. Token bucket is often preferred when we want to allow controlled bursts while enforcing a long-term average rate.
In a distributed setup, I would use a hybrid design: local limiters for low-latency protection and Redis-based global limiters for stronger accuracy.
Redis Lua scripts can make counter updates atomic and prevent race conditions under high concurrency.
Rate limit configs should be centrally managed and cached locally in gateway nodes.
When a request exceeds the limit, the system should return HTTP 429 with Retry-After.
Failure behavior depends on the API: security-sensitive APIs may fail closed, while normal user-facing APIs may fail open with local fallback.
The main trade-offs are accuracy, latency, availability, memory usage, and operational complexity.
Ultimately, the goal is to protect backend systems while minimizing impact on legitimate users.
⭐ Final Insight
Rate Limiter 的核心不是简单计数, 而是在分布式系统中用极低延迟实现公平性、保护性和可控性。
中文部分
🎯 Design Rate Limiter
1️⃣ 核心框架
在设计 Rate Limiter 时,我通常从以下几个方面来分析:
- 核心目的和需求
- 限流算法
- 限流维度:用户、IP、API、租户
- 存储和计数策略
- 分布式限流
- 突发流量处理
- 故障处理和 fallback
- 核心权衡:准确性 vs 延迟 vs 可用性
2️⃣ 核心需求
功能需求
- 限制每个用户 / IP / API key 的请求数
- 支持不同 API 配置不同限流规则
- 支持不同用户等级配置不同限额
- 超过限制时返回正确错误
- 支持突发流量控制
- 支持分布式部署
非功能需求
- 极低延迟
- 高可用
- 支持高 QPS
- 在分布式流量下足够准确
- 配置灵活
- 对正常请求路径影响很小
👉 面试回答
Rate Limiter 用来控制某个 client 在指定时间窗口内可以发送多少请求。
它可以保护后端服务, 防止恶意攻击、突发流量、客户端 bug 以及不公平的资源使用。
核心挑战是在保持极低延迟和高可用的同时, 尽可能准确地执行限流规则。
3️⃣ 主要 API / 集成点
Check Limit API
POST /api/rate-limit/check
Request:
{
"key": "user:u123",
"api": "/checkout",
"cost": 1
}
Response:
{
"allowed": true,
"remaining": 99,
"resetAt": "2026-05-02T10:01:00Z"
}
Gateway 集成
Rate limiter 通常集成在:
Client
→ API Gateway / Load Balancer
→ Rate Limiter
→ Backend Service
超限响应
HTTP/1.1 429 Too Many Requests
Retry-After: 30
👉 面试回答
我通常会将 rate limiter 放在 API Gateway 层, 因为这样可以在流量进入后端服务之前 就拦截异常或恶意请求。
当请求超过限制时, 系统应该返回 HTTP 429, 并带上 Retry-After header。
4️⃣ 限流算法
算法 1:Fixed Window Counter
示例:
100 requests per minute
实现:
key = user_id + current_minute
counter++
优点
- 简单
- 快
- 内存占用低
缺点
- 有窗口边界问题
- 允许窗口边界附近的突发流量
示例:
00:59 发送 100 个请求
01:00 再发送 100 个请求
→ 2 秒内实际发送 200 个请求
👉 面试回答
Fixed window counter 简单高效, 但存在窗口边界问题。
客户端可以在一个窗口末尾和下一个窗口开头连续发送请求, 造成超过预期的突发流量。
算法 2:Sliding Window Log
存储每个请求的时间戳:
user:u123 → [t1, t2, t3, ...]
每次请求时:
- 删除窗口外的旧 timestamp
- 统计窗口内请求数
- 如果 count < limit,则允许
优点
- 非常准确
- 没有 fixed window 的边界突发问题
缺点
- 内存开销大
- 对高 QPS 用户成本高
👉 面试回答
Sliding window log 非常准确, 因为它记录每个请求的时间戳。
但是它的内存和计算成本较高, 特别是对于高 QPS 用户。
算法 3:Sliding Window Counter
使用当前窗口和上一个窗口近似计算。
公式:
effective_count =
current_window_count +
previous_window_count * overlap_ratio
优点
- 比 fixed window 更准确
- 比 sliding window log 更省内存
- 是实践中常用的折中方案
缺点
- 是近似值
- 比 fixed window 稍微复杂
👉 面试回答
Sliding window counter 是一个实用的折中方案。
它可以避免大部分 fixed window 的边界突发问题, 同时比 sliding window log 使用更少内存。
当系统需要在大规模下保持合理准确性时, 这个方案很常用。
算法 4:Token Bucket
每个 key 有一个 bucket:
bucket capacity = burst size
refill rate = allowed rate
示例:
capacity = 100 tokens
refill = 10 tokens / second
每个请求消耗一个 token。
优点
- 支持突发流量
- 平滑长期平均速率
- 生产系统常用
缺点
- 需要记录 token 数和上次 refill 时间
- 实现稍微复杂
👉 面试回答
Token bucket 适合在允许一定突发流量的同时, 控制长期平均请求速率。
每个请求消耗 token, token 会按照固定速率补充。
如果 bucket 为空, 请求会被拒绝或延迟。
算法 5:Leaky Bucket
请求进入队列,并按照固定速率处理。
优点
- 平滑流量
- 保护下游系统
缺点
- 增加排队延迟
- 队列可能溢出
- 不太适合低延迟要求强的请求路径
👉 面试回答
Leaky bucket 通过固定速率处理请求来平滑流量。
它适合保护下游系统, 但可能引入排队延迟。
推荐算法
对于大多数 API rate limiter:
Token Bucket or Sliding Window Counter
核心理解
Fixed window 简单, sliding window 更准确, token bucket 更适合处理突发流量。
5️⃣ Rate Limit Key 设计
常见 Key
user_id
api_key
ip_address
tenant_id
endpoint
device_id
组合 Key
示例:
user:u123:/checkout
ip:1.2.3.4:/login
tenant:t1:/api/orders
api_key:k123:/search
为什么 Key 设计重要?
- 控制公平性
- 防止滥用
- 支持 endpoint 级别限流
- 支持 tenant-specific quota
👉 面试回答
Rate limit key 的设计非常重要。
我们可以按 user ID、IP、API key、 endpoint、tenant 或它们的组合进行限流。
例如 login API 可以按 IP 和 username 限流, 而付费 API 更适合按 tenant 或 API key 限流。
6️⃣ 数据存储
Local In-memory Counter
gateway node memory
优点
- 极快
- 没有网络调用
缺点
- 全局不准确
- 每个节点只看到部分流量
- 节点重启后数据丢失
Redis Counter
Redis key → counter / token bucket state
优点
- 计数状态集中
- 快
- 支持 TTL
- 可以用 Lua 实现原子操作
缺点
- 有网络延迟
- Redis 可能成为瓶颈
- Redis 故障会影响 limiter
Persistent Database
通常不用于热路径。
适合:
- 配置存储
- 审计日志
- 用量报表
- Billing quotas
👉 面试回答
对于热路径, 我通常会使用 Redis 或本地内存。
Redis 可以提供集中式计数、 原子操作和 TTL; 本地内存更快, 但在分布式系统中不够准确。
持久化数据库更适合存储配置、审计和 billing 数据, 不适合每个请求都查询。
7️⃣ 分布式限流
挑战
在分布式系统中, 流量会被分散到多个 gateway 节点:
Client → Gateway A
Client → Gateway B
Client → Gateway C
每个节点只能看到一部分流量。
方案 1:集中式 Redis
所有 gateway 都检查 Redis。
Gateway → Redis → decision
优点
- 更准确
- 简单实现全局限制
缺点
- 增加延迟
- 可能出现 Redis hot key
- 依赖 Redis 可用性
方案 2:Local Limiters
每个 gateway 执行一部分限额。
global limit = 1000/sec
10 gateways
each gateway allows 100/sec
优点
- 极快
- 高可用
缺点
- 不够准确
- 流量分布不均时效果不好
方案 3:Hybrid
使用:
local limiter for fast protection
global Redis limiter for accuracy
👉 面试回答
在分布式系统中, 同一个用户的流量可能打到多个 gateway 节点。
集中式 Redis limiter 可以提供更好的全局准确性, 但会增加延迟和外部依赖。
Local limiter 更快、更高可用, 但准确性较弱。
在生产系统中, 我通常会采用 hybrid 方案: 使用 local limiter 做快速保护, 再用 Redis-based global limiter 做更强的全局限制。
8️⃣ Redis + Lua 原子操作
为什么用 Lua?
Rate limiter 需要原子操作:
读取当前 count
增加 count
设置 TTL
返回 allow / reject
如果不是原子操作, 高并发下可能会超过限制。
示例逻辑
local current = redis.call("GET", key)
if current == false then
redis.call("SET", key, 1, "EX", window)
return 1
end
if tonumber(current) < limit then
redis.call("INCR", key)
return 1
else
return 0
end
👉 面试回答
我会使用 Redis Lua script 来保证 rate limit check 的原子性。
脚本会在一次原子操作中完成读取 counter、 increment、设置 TTL, 并返回 allow 或 reject。
这样可以避免高并发下的 race condition。
9️⃣ 突发流量处理
为什么 Burst 重要?
有些 client 合理地会产生突发请求。
例如:
- 页面加载触发多个 API calls
- 移动 app 重新连接
- Batch job 启动
- 用户刷新 dashboard
策略
- Token bucket with burst capacity
- 短期限制 + 长期限制
- Per-endpoint limits
- 低优先级请求排队
- 返回 Retry-After
示例
Short-term: 20 requests / second
Long-term: 1000 requests / hour
👉 面试回答
我会使用 token bucket 或 layered limits 来处理突发流量。
例如,一个用户每秒最多可以发 20 个请求, 但每小时最多只能发 1000 个请求。
这样既能支持正常突发, 又能防止长期滥用。
🔟 配置管理
Config Examples
{
"endpoint": "/checkout",
"limit": 100,
"windowSeconds": 60,
"algorithm": "token_bucket",
"scope": "user_id",
"priority": "critical"
}
Requirements
- Dynamic config updates
- Per-user-tier limits
- Per-endpoint limits
- Emergency override
- Safe rollout
👉 面试回答
Rate limits 应该是可配置的, 因为不同 API 和不同用户等级有不同流量模式。
我会将配置存储在中心化 config service 中, 并缓存到 gateway 节点。
同时需要支持安全发布和紧急 override。
1️⃣1️⃣ 响应行为
当请求被允许
请求转发到后端。
可以添加 header:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1714653600
当请求被限流
返回:
HTTP 429 Too Many Requests
Retry-After: 30
Hard Block vs Soft Limit
- Hard block:立即拒绝
- Soft limit:允许但记录 / 降级
- Shadow mode:只评估,不拦截
👉 面试回答
当请求被允许时, 系统会将请求转发到后端, 并可以返回 rate-limit headers。
当超过限制时, 系统应该返回 HTTP 429 和 Retry-After。
在强制执行新的限流规则之前, 我通常会先用 shadow mode 观察影响。
1️⃣2️⃣ 故障处理
常见故障
- Redis unavailable
- Config service unavailable
- Gateway restart
- Hot key overload
- Clock skew
- Network timeout
Fail Open vs Fail Closed
| Mode | Meaning | Use Case |
|---|---|---|
| Fail open | limiter 失败时允许请求 | 低风险用户 API |
| Fail closed | limiter 失败时拒绝请求 | 安全敏感 API |
推荐策略
- 安全 API:fail closed 或 strict fallback
- 普通 API:fail open with local emergency limit
- Redis 使用 circuit breaker
- 缓存 last known config
👉 面试回答
故障时的行为取决于 API 类型。
对于 login、payment 等安全敏感 API, 可以 fail closed 或使用严格的本地 fallback。
对于普通用户 API, 可以 fail open, 避免误伤正常用户。
同时我会缓存配置, 并对 Redis 等外部依赖使用 circuit breaker。
1️⃣3️⃣ 可观测性
Metrics
- Allowed request count
- Rejected request count
- Rate limit hit rate
- Redis latency
- Redis error rate
- Hot keys
- Config update latency
- False positive complaints
- Backend protection effectiveness
Logs
追踪:
request_id
rate_limit_key
algorithm
limit
remaining
decision
reason
👉 面试回答
可观测性非常重要, 因为 rate limiter 会直接影响用户流量。
我会追踪 allowed / rejected requests、 Redis latency、hot keys、limit hit rate, 以及对后端错误率的影响。
详细日志可以帮助排查 false positives, 并安全地调整限流规则。
1️⃣4️⃣ 一致性模型
需要更强一致性的场景
- 安全敏感限流
- Payment 或 login abuse prevention
- Billing quota enforcement
可以接受近似 / 最终一致的场景
- 通用 API 流量整形
- Soft quotas
- Best-effort abuse prevention
- Internal service protection
👉 面试回答
Rate limiting 通常不需要完美一致性。
对大多数 API 来说, 近似限制是可以接受的, 因为目标是保护系统和保证公平性。
但对于安全敏感或 billing 相关的限制, 更强一致性和可审计性会更重要。
1️⃣5️⃣ End-to-End Flow
Request Flow
Client request
→ API Gateway
→ Build rate limit key
→ Load config
→ Check local limiter
→ Check global Redis limiter
→ Allow or reject
→ Forward to backend if allowed
Redis Limiter Flow
Gateway
→ Execute Redis Lua script
→ Atomically update counter/token state
→ Return allow/reject decision
Limited Flow
Request exceeds limit
→ Return HTTP 429
→ Include Retry-After header
→ Log decision
→ Emit metric
Key Insight
Rate Limiter 不是简单计数器, 它是一个低延迟的分布式流量控制系统。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 Rate Limiter 时, 我会把它看作一个低延迟的分布式流量控制系统, 用来保护后端服务免受滥用、突发流量和不公平资源使用的影响。
我通常会将它放在 API Gateway 层, 这样可以在请求到达后端之前进行拦截。
系统需要支持按 user、IP、API key、tenant、 endpoint 和 user tier 等不同维度限流。
在算法选择上, fixed window 简单但有窗口边界突发问题; sliding window log 准确但成本高; sliding window counter 是一个较好的折中; token bucket 则适合允许受控突发, 同时限制长期平均速率。
在分布式环境中, 我会采用 hybrid 设计: 使用 local limiter 做低延迟保护, 使用 Redis-based global limiter 做更准确的全局限制。
Redis Lua script 可以保证 counter 更新的原子性, 避免高并发下的 race condition。
Rate limit config 应该集中管理, 并缓存在 gateway 节点本地。
当请求超过限制时, 系统应该返回 HTTP 429 和 Retry-After。
故障处理取决于 API 类型: 安全敏感 API 可以 fail closed, 普通用户 API 可以 fail open 并使用本地 fallback。
核心权衡包括准确性、延迟、可用性、 内存使用和运维复杂度。
最终目标是在尽量不影响正常用户的前提下, 保护后端系统并保证资源使用公平。
⭐ Final Insight
Rate Limiter 的核心不是简单计数, 而是在分布式系统中用极低延迟实现公平性、保护性和可控性。
Implement