d&d-t System Design Deep Dive ·

🎯 Design Twitter Timeline

1️⃣ Core Framework

When discussing Twitter Timeline design, I frame it as:

Core user flows: post tweet, follow user, read timeline
Data model: users, tweets, follow graph, timeline entries
Timeline generation strategy: fanout-on-write vs fanout-on-read
Ranking and personalization
Caching and scaling patterns
Trade-offs: freshness vs latency vs storage cost
Failure handling and consistency

2️⃣ Core Requirements

Functional Requirements

User can post tweets
User can follow / unfollow other users
User can view home timeline
User can view user profile timeline
Support media, likes, replies, reposts
Support ranking or chronological ordering

Non-functional Requirements

Low-latency timeline reads
High write throughput
High availability
Scalable follow graph
Near-real-time freshness
Eventual consistency is acceptable for timeline delivery

👉 Interview Answer

Twitter Timeline has two core flows: users publish tweets, and followers read timelines.

The read path is extremely latency-sensitive, so we usually precompute or cache timelines.

The main challenge is balancing freshness, latency, storage cost, and handling users with millions of followers.

3️⃣ Main APIs

Post Tweet

POST /api/tweets

Request:

{
  "userId": "u123",
  "content": "Hello world",
  "mediaIds": ["m1", "m2"]
}

Response:

{
  "tweetId": "t789",
  "createdAt": "2026-05-02T10:00:00Z"
}

Follow User

POST /api/follow

Request:

{
  "followerId": "u123",
  "followeeId": "u456"
}

Get Home Timeline

GET /api/timeline/home?userId=u123&cursor=xxx&limit=50

Get Profile Timeline

GET /api/timeline/profile?userId=u456&cursor=xxx&limit=50

👉 Interview Answer

I would separate write APIs, such as posting tweets and following users, from read APIs, such as fetching home timeline and profile timeline.

Home timeline is more complex because it depends on the follow graph, while profile timeline is simpler because it only contains tweets from one user.

4️⃣ Data Model

User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Tweet Table

tweet (
  tweet_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  created_at TIMESTAMP,
  status VARCHAR
)

Follow Table

follow (
  follower_id VARCHAR,
  followee_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (follower_id, followee_id)
)

Reverse Follow Table

reverse_follow (
  followee_id VARCHAR,
  follower_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (followee_id, follower_id)
)

Timeline Table

home_timeline (
  user_id VARCHAR,
  tweet_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  PRIMARY KEY (user_id, created_at, tweet_id)
)

Why Keep Both Follow and Reverse Follow?

follow helps answer: who do I follow?
reverse_follow helps answer: who follows this author?
Timeline fanout needs reverse follow lookup

👉 Interview Answer

I would store tweets separately from timeline entries.

Tweets are the source of truth, while timeline entries are denormalized views optimized for fast reads.

I would also maintain both follow and reverse-follow tables, because timeline generation needs to quickly find followers of an author.

5️⃣ Timeline Generation Strategy

Option 1: Fanout-on-Write

When a user posts a tweet:

author posts tweet
→ find all followers
→ insert tweet into each follower's home timeline

Pros

Timeline read is very fast
Good for normal users
Simple read path

Cons

Expensive for users with many followers
Write amplification
Hard to handle celebrity users

👉 Interview Answer

Fanout-on-write precomputes timeline entries when a tweet is created.

This makes timeline reads very fast, because the system only needs to read from the user’s precomputed timeline.

However, it creates write amplification, especially when a celebrity user posts to millions of followers.

Option 2: Fanout-on-Read

When user opens timeline:

user opens timeline
→ get followees
→ fetch recent tweets from followees
→ merge and rank

Pros

Lower write cost
Better for celebrity users
No massive fanout at write time

Cons

Timeline read is slower
Expensive for users following many accounts
Requires merge and ranking at read time

👉 Interview Answer

Fanout-on-read computes the timeline when the user requests it.

This avoids massive write amplification, but makes reads more expensive because we need to fetch, merge, and rank tweets from many followees.

Option 3: Hybrid Fanout

Recommended approach:

Normal users → fanout-on-write
Celebrity users → fanout-on-read

Why Hybrid?

Most users have small follower counts
A small number of celebrity users cause huge fanout cost
Hybrid keeps reads fast while controlling write amplification

👉 Interview Answer

In production, I would use a hybrid approach.

For normal users, I use fanout-on-write so their followers can read timelines quickly.

For celebrity users, I use fanout-on-read to avoid pushing one tweet into millions of timelines.

At read time, I merge precomputed timeline entries with recent tweets from celebrity accounts.

Core Insight

Timeline design is mainly about choosing where to pay the cost: at write time or read time.

6️⃣ Home Timeline Read Flow

Basic Flow

User opens home timeline
Timeline service checks cache
Fetch precomputed timeline entries
Fetch missing tweet metadata
Pull recent tweets from celebrity followees
Merge results
Rank or sort
Return timeline page

Chronological Timeline

sort by created_at desc

Simple and predictable.

Ranked Timeline

Signals may include:

Recency
Relationship strength
Engagement
User interests
Tweet quality
Negative feedback
Author reputation

👉 Interview Answer

For home timeline reads, I would first fetch precomputed timeline entries from cache or storage.

Then I would merge in tweets from high-follower accounts that are handled by fanout-on-read.

Finally, I would rank or sort the results before returning them to the user.

7️⃣ Ranking and Personalization

Why Ranking?

Pure chronological feed may miss important content.

Ranking improves:

Relevance
Engagement
User retention
Content quality

Ranking Pipeline

Candidate Generation
→ Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking

Candidate Sources

Tweets from followees
Replies from followed users
Reposts
Recommended tweets
Ads

Ranking Signals

Recency
User-author affinity
Likes / replies / reposts
Dwell time
Topic interests
Muted / blocked users
Content safety signals

👉 Interview Answer

Ranking can be added as a separate layer on top of timeline retrieval.

The system first generates candidates, filters out blocked or low-quality content, applies ranking models, and then re-ranks results for diversity, freshness, and safety.

This keeps the retrieval layer scalable while allowing ranking to improve relevance.

8️⃣ Caching Strategy

What to Cache?

Home timeline entries
User profile timeline
Tweet objects
User profile metadata
Follow graph
Celebrity recent tweets

Cache Layers

Local cache in timeline service
Redis / Memcached
CDN for media
Edge cache for public profile timelines

Cache Challenges

New tweets
Deleted tweets
Blocked users
Follow / unfollow updates
Ranking freshness

👉 Interview Answer

Caching is critical because timeline reads are frequent and latency-sensitive.

I would cache home timeline entries, tweet objects, user metadata, and celebrity recent tweets.

However, cache invalidation is important when tweets are deleted, users block each other, or follow relationships change.

9️⃣ Write Path: Posting a Tweet

Flow

User posts tweet
Tweet service validates request
Store tweet in tweet table
Publish tweet-created event
Fanout service consumes event
Find followers from reverse follow table
Push timeline entries to followers
Update caches asynchronously

Event Pipeline

Tweet Service
→ Kafka / Queue
→ Fanout Workers
→ Timeline Store
→ Cache Update

👉 Interview Answer

I would decouple tweet creation from timeline fanout using an async event pipeline.

The tweet service stores the tweet as the source of truth and publishes a tweet-created event.

Fanout workers consume the event and update followers’ timeline stores asynchronously.

This improves write latency and system resilience.

🔟 Trade-offs

Fanout-on-Write vs Fanout-on-Read

Strategy	Pros	Cons
Fanout-on-write	Fast reads	Expensive writes
Fanout-on-read	Cheap writes	Slower reads
Hybrid	Balanced	More complex

Freshness vs Latency

Real-time fanout improves freshness
Async fanout may introduce delay
Cache improves latency but may be stale

Storage Cost vs Read Performance

Precomputed timelines require more storage
But greatly improve read performance

Consistency vs Availability

Timeline can be eventually consistent
Tweet creation should be durable
Reads should stay available even if ranking/fanout is delayed

👉 Interview Answer

The main trade-off is between write amplification and read latency.

Fanout-on-write makes reads fast, but it is expensive for users with many followers.

Fanout-on-read avoids write amplification, but makes the read path more expensive.

A hybrid approach is usually best for large-scale systems.

1️⃣1️⃣ Scaling Patterns

Pattern 1: Hybrid Fanout

Normal users: push model
Celebrity users: pull model

Pattern 2: Async Processing

Use queue-based fanout:

tweet event → queue → fanout workers

Pattern 3: Timeline Precomputation

Store timeline entries per user.

Pattern 4: Cache Hot Data

Cache:

Hot timelines
Hot tweets
Celebrity tweets
User profiles

Pattern 5: Shard Timeline Store

Shard by:

user_id

Why?

Timeline reads are usually per user
Keeps one user’s timeline localized

Pattern 6: Separate Tweet Store and Timeline Store

Tweet store = source of truth
Timeline store = read-optimized view

Pattern 7: Backpressure for Celebrity Tweets

If a celebrity posts:

Do not fan out to all followers immediately
Store in celebrity tweet cache
Merge at read time
Rate-limit fanout jobs

👉 Interview Answer

To scale Twitter Timeline, I would separate the tweet store from the timeline store, use async fanout workers, cache hot data, and shard timelines by user ID.

I would also use hybrid fanout to avoid massive write amplification from celebrity users.

1️⃣2️⃣ Failure Handling

Common Failures

Fanout worker failure
Queue backlog
Timeline store unavailable
Cache miss spike
Ranking service down
Deleted tweet still visible
Follow graph inconsistency

Strategies

Retry fanout jobs
Use dead-letter queue
Rebuild timelines from tweet store
Serve cached timeline if timeline store is degraded
Fall back to chronological ranking if ranking service fails
Async cleanup for deleted tweets

👉 Interview Answer

The system should degrade gracefully.

If ranking fails, we can fall back to chronological order. If fanout is delayed, users may see slightly stale timelines, which is usually acceptable.

Since tweets are stored as the source of truth, we can rebuild timeline entries if needed.

1️⃣3️⃣ Consistency Model

Strong Consistency Needed For

Tweet creation durability
Follow / unfollow state
Delete tweet permissions
Block / mute rules

Eventual Consistency Acceptable For

Home timeline delivery
Analytics counters
Like / repost counts
Recommendation ranking
Fanout delay

👉 Interview Answer

I would not require strong consistency for home timeline delivery.

It is acceptable if a new tweet appears a few seconds later.

However, stronger consistency is needed for security-sensitive actions such as delete, block, mute, and permission checks.

1️⃣4️⃣ End-to-End Flow

Post Tweet Flow

User posts tweet
→ Tweet Service stores tweet
→ Publish tweet-created event
→ Fanout workers update timelines
→ Timeline cache refreshed

Read Timeline Flow

User opens timeline
→ Timeline Service checks cache
→ Fetch precomputed timeline entries
→ Pull celebrity tweets
→ Merge candidates
→ Rank results
→ Return feed

Key Insight

Twitter Timeline is not just a feed list — it is a large-scale fanout, caching, and ranking system.

🧠 Staff-Level Answer (Final)

👉 Interview Answer (Full Version)

When designing Twitter Timeline, I think of it as two main flows: publishing tweets and reading timelines.

The tweet store is the source of truth, while the home timeline is a denormalized, read-optimized view.

For timeline generation, I would use a hybrid fanout strategy. Normal users use fanout-on-write, where new tweets are pushed into followers’ timelines. Celebrity users use fanout-on-read, where their recent tweets are pulled and merged at read time.

This balances read latency and write amplification.

I would decouple tweet creation from fanout using an asynchronous queue, so posting a tweet does not block on updating millions of timelines.

For reads, I would cache timeline entries, tweet objects, user metadata, and celebrity tweets.

Ranking can be added as a separate layer using signals such as recency, engagement, relationship strength, and user interests.

The main trade-offs are freshness, latency, storage cost, and consistency.

Timeline delivery can be eventually consistent, but actions like delete, block, and mute need stronger correctness.

Ultimately, the goal is to deliver a fresh and relevant timeline with low latency at massive scale.

⭐ Final Insight

Twitter Timeline is mainly about managing fanout at scale — deciding what to push, what to pull, and what to rank.

中文部分

🎯 Design Twitter Timeline

1️⃣ 核心框架

在设计 Twitter Timeline 时，我通常从以下几个方面来分析：

核心用户流程：发 tweet、关注用户、读取 timeline
数据模型：用户、tweet、关注关系、timeline entry
Timeline 生成策略：fanout-on-write vs fanout-on-read
排序和个性化
缓存和扩展模式
核心权衡：新鲜度 vs 延迟 vs 存储成本
故障处理和一致性

2️⃣ 核心需求

功能需求

用户可以发布 tweet
用户可以关注 / 取消关注其他用户
用户可以查看 home timeline
用户可以查看某个用户的 profile timeline
支持图片、视频、点赞、回复、转发
支持按时间排序或个性化排序

非功能需求

Timeline 读取延迟低
支持高写入吞吐
高可用
支持大规模 follow graph
接近实时的新鲜度
Timeline 展示可以接受最终一致性

👉 面试回答

Twitter Timeline 有两个核心流程：用户发布 tweet，以及 follower 读取 timeline。

Timeline 读取路径对延迟非常敏感，所以通常需要预计算或缓存 timeline。

主要挑战是如何在新鲜度、延迟、存储成本之间做平衡，同时处理拥有大量粉丝的 celebrity 用户。

3️⃣ 主要 API

发布 Tweet

POST /api/tweets

Request:

{
  "userId": "u123",
  "content": "Hello world",
  "mediaIds": ["m1", "m2"]
}

Response:

{
  "tweetId": "t789",
  "createdAt": "2026-05-02T10:00:00Z"
}

关注用户

POST /api/follow

Request:

{
  "followerId": "u123",
  "followeeId": "u456"
}

获取 Home Timeline

GET /api/timeline/home?userId=u123&cursor=xxx&limit=50

获取 Profile Timeline

GET /api/timeline/profile?userId=u456&cursor=xxx&limit=50

👉 面试回答

我会将写 API 和读 API 分开设计。

写 API 包括发布 tweet 和关注用户；读 API 包括获取 home timeline 和 profile timeline。

Home timeline 更复杂，因为它依赖用户的 follow graph； Profile timeline 更简单，因为它只包含某一个用户发布的内容。

4️⃣ 数据模型

User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Tweet Table

tweet (
  tweet_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  created_at TIMESTAMP,
  status VARCHAR
)

Follow Table

follow (
  follower_id VARCHAR,
  followee_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (follower_id, followee_id)
)

Reverse Follow Table

reverse_follow (
  followee_id VARCHAR,
  follower_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (followee_id, follower_id)
)

Timeline Table

home_timeline (
  user_id VARCHAR,
  tweet_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  PRIMARY KEY (user_id, created_at, tweet_id)
)

为什么需要 Follow 和 Reverse Follow 两张表？

follow 用于查询：我关注了谁？
reverse_follow 用于查询：谁关注了这个作者？
Timeline fanout 需要快速找到作者的 followers

👉 面试回答

我会将 tweet 本身和 timeline entry 分开存储。

Tweet 表是 source of truth，而 timeline entry 是为读取优化的反规范化视图。

我也会同时维护 follow 和 reverse-follow 两张表，因为 timeline fanout 需要快速找到某个作者的所有 followers。

5️⃣ Timeline 生成策略

方案 1：Fanout-on-Write

当用户发布 tweet 时：

author posts tweet
→ find all followers
→ insert tweet into each follower's home timeline

优点

Timeline 读取非常快
适合普通用户
读路径简单

缺点

对大 V 用户非常昂贵
写放大严重
Celebrity 用户难处理

👉 面试回答

Fanout-on-write 是在 tweet 创建时，就将这个 tweet 推送到所有 follower 的 timeline 中。

这样读取 timeline 时非常快，因为系统只需要读取用户已经预计算好的 timeline。

但是这种方式会造成严重写放大，特别是当一个 celebrity 用户拥有数百万粉丝时。

方案 2：Fanout-on-Read

当用户打开 timeline 时：

user opens timeline
→ get followees
→ fetch recent tweets from followees
→ merge and rank

优点

写入成本低
更适合 celebrity 用户
不需要在写入时做大规模 fanout

缺点

Timeline 读取更慢
对关注很多人的用户成本较高
读取时需要 merge 和 rank

👉 面试回答

Fanout-on-read 是在用户读取 timeline 时动态计算 timeline。

这种方式避免了写入时的大规模 fanout，但是会让读取路径变重，因为需要从多个 followee 拉取 tweet，再做合并和排序。

方案 3：Hybrid Fanout

推荐方案：

Normal users → fanout-on-write
Celebrity users → fanout-on-read

为什么用 Hybrid？

大部分用户粉丝数量较小
少数 celebrity 用户会造成巨大的 fanout 成本
Hybrid 可以保持普通 timeline 读取快，同时控制写放大

👉 面试回答

在生产系统中，我会采用 hybrid fanout。

对普通用户使用 fanout-on-write，这样他们的 follower 可以快速读取 timeline。

对 celebrity 用户使用 fanout-on-read，避免将一条 tweet 推送到数百万个 timeline 中。

在读取时，再将预计算 timeline 和 celebrity 用户的最近 tweets 合并。

核心理解

Timeline 设计本质是在决定：成本是在写入时支付，还是在读取时支付。

6️⃣ Home Timeline 读取流程

基本流程

用户打开 home timeline
Timeline service 查询 cache
获取预计算 timeline entries
获取缺失的 tweet metadata
拉取 celebrity followees 的最新 tweets
合并结果
排序或 rank
返回 timeline page

按时间排序 Timeline

sort by created_at desc

简单、可预测。

Ranked Timeline

常见信号包括：

新鲜度
用户和作者的关系强度
互动数据
用户兴趣
Tweet 质量
负反馈
作者信誉

👉 面试回答

对于 home timeline 读取，我会先从 cache 或 timeline store 获取预计算 timeline entries。

然后合并来自 celebrity followees 的最近 tweets，因为这些 tweet 通常不会提前 fanout。

最后再对结果进行排序或 rank，然后返回给用户。

7️⃣ 排序和个性化

为什么需要 Ranking？

纯时间排序可能会错过重要内容。

Ranking 可以提升：

内容相关性
用户参与度
用户留存
内容质量

Ranking Pipeline

Candidate Generation
→ Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking

Candidate Sources

关注用户发布的 tweets
关注用户的 replies
Reposts
推荐 tweets
Ads

Ranking Signals

新鲜度
用户和作者的亲密度
Likes / replies / reposts
停留时间
用户兴趣 topic
静音 / 拉黑关系
内容安全信号

👉 面试回答

Ranking 可以作为 timeline retrieval 之后的独立层。

系统先生成候选内容，然后过滤掉 blocked 或低质量内容，再通过 ranking model 排序，最后根据多样性、新鲜度和安全性进行 re-ranking。

这样可以保持 retrieval 层可扩展，同时通过 ranking 提高内容相关性。

8️⃣ 缓存策略

缓存什么？

Home timeline entries
User profile timeline
Tweet objects
User profile metadata
Follow graph
Celebrity recent tweets

缓存层

Timeline service 本地缓存
Redis / Memcached
CDN 用于媒体资源
Public profile timeline 可以使用 edge cache

缓存挑战

新 tweet
删除 tweet
Block 用户
Follow / unfollow 更新
Ranking 新鲜度

👉 面试回答

缓存非常关键，因为 timeline 读取频繁且对延迟敏感。

我会缓存 home timeline entries、tweet objects、 user metadata 和 celebrity recent tweets。

但是需要谨慎处理缓存失效，特别是 tweet 删除、block 关系变化、 follow 关系变化这些场景。

9️⃣ 写路径：发布 Tweet

流程

用户发布 tweet
Tweet service 校验请求
将 tweet 存入 tweet table
发布 tweet-created event
Fanout service 消费事件
从 reverse follow table 找到 followers
将 timeline entries 推送到 followers
异步更新 cache

Event Pipeline

Tweet Service
→ Kafka / Queue
→ Fanout Workers
→ Timeline Store
→ Cache Update

👉 面试回答

我会使用异步事件管道将 tweet 创建和 timeline fanout 解耦。

Tweet service 会先将 tweet 作为 source of truth 存储下来，然后发布 tweet-created event。

Fanout workers 消费这个事件，并异步更新 followers 的 timeline store。

这样可以降低发 tweet 的写入延迟，同时提高系统稳定性。

🔟 核心权衡

Fanout-on-Write vs Fanout-on-Read

Strategy	优点	缺点
Fanout-on-write	读取快	写入昂贵
Fanout-on-read	写入便宜	读取慢
Hybrid	折中	系统更复杂

新鲜度 vs 延迟

实时 fanout 提高新鲜度
异步 fanout 可能有延迟
缓存提升延迟，但可能读到旧数据

存储成本 vs 读取性能

预计算 timeline 需要更多存储
但可以显著提升读取性能

一致性 vs 可用性

Timeline 可以最终一致
Tweet 创建需要持久化
即使 ranking / fanout 延迟，读取也应保持可用

👉 面试回答

这里最核心的权衡是写放大和读取延迟。

Fanout-on-write 可以让读取非常快，但是对于粉丝很多的用户写入成本非常高。

Fanout-on-read 可以避免写放大，但是会让读取路径更昂贵。

对大规模系统来说，hybrid fanout 通常是更好的选择。

1️⃣1️⃣ 扩展模式

Pattern 1: Hybrid Fanout

普通用户：push model
Celebrity 用户：pull model

Pattern 2: Async Processing

使用 queue-based fanout：

tweet event → queue → fanout workers

Pattern 3: Timeline Precomputation

为每个用户预计算 timeline entries。

Pattern 4: Cache Hot Data

缓存：

热门 timeline
热门 tweets
Celebrity tweets
User profiles

Pattern 5: Shard Timeline Store

按以下字段分片：

user_id

原因：

Timeline 读取通常是按 user 查询
可以让单个用户的 timeline 数据尽量聚合在一起

Pattern 6: Separate Tweet Store and Timeline Store

Tweet store = source of truth
Timeline store = read-optimized view

Pattern 7: Backpressure for Celebrity Tweets

当 celebrity 用户发 tweet 时：

不立即 fanout 给所有 followers
存入 celebrity tweet cache
读取时再 merge
对 fanout job 做限流

👉 面试回答

为了扩展 Twitter Timeline，我会将 tweet store 和 timeline store 分开，使用异步 fanout workers，缓存热点数据，并按照 user_id 对 timeline store 做分片。

同时，我会使用 hybrid fanout 来避免 celebrity 用户造成巨大的写放大。

1️⃣2️⃣ 故障处理

常见故障

Fanout worker 失败
Queue backlog
Timeline store 不可用
Cache miss 突增
Ranking service 故障
已删除 tweet 仍然可见
Follow graph 不一致

处理策略

重试 fanout jobs
使用 dead-letter queue
从 tweet store 重建 timeline
Timeline store 降级时返回缓存 timeline
Ranking service 失败时回退到时间排序
异步清理已删除 tweets

👉 面试回答

系统需要优雅降级。

如果 ranking service 失败，可以回退到按时间排序。

如果 fanout 延迟，用户可能看到稍微旧一点的 timeline，这通常是可以接受的。

因为 tweet store 是 source of truth，所以 timeline entries 在必要时可以重建。

1️⃣3️⃣ 一致性模型

需要较强一致性的场景

Tweet 创建持久化
Follow / unfollow 状态
删除 tweet 权限
Block / mute 规则

可以最终一致的场景

Home timeline 展示
Analytics counters
Like / repost counts
推荐排序
Fanout 延迟

👉 面试回答

我不会要求 home timeline delivery 强一致。

新 tweet 晚几秒出现在 timeline 中通常是可以接受的。

但是对于 delete、block、mute 和权限检查这类安全敏感操作，需要更强的正确性保证。

1️⃣4️⃣ End-to-End Flow

发布 Tweet 流程

User posts tweet
→ Tweet Service stores tweet
→ Publish tweet-created event
→ Fanout workers update timelines
→ Timeline cache refreshed

读取 Timeline 流程

User opens timeline
→ Timeline Service checks cache
→ Fetch precomputed timeline entries
→ Pull celebrity tweets
→ Merge candidates
→ Rank results
→ Return feed

Key Insight

Twitter Timeline 不只是一个 feed list，它是一个大规模 fanout、缓存和排序系统。

🧠 Staff-Level Answer（最终版）

👉 面试回答（完整背诵版）

在设计 Twitter Timeline 时，我会将系统拆成两个主要流程：发布 tweet 和读取 timeline。

Tweet store 是 source of truth，而 home timeline 是一个为了快速读取而设计的反规范化视图。

对于 timeline 生成，我会使用 hybrid fanout 策略。普通用户使用 fanout-on-write，在 tweet 创建时将内容推送到 followers 的 timeline 中。 Celebrity 用户使用 fanout-on-read，在读取 timeline 时再拉取并合并他们的最新 tweets。

这样可以平衡读取延迟和写放大。

我会通过异步 queue 将 tweet 创建和 fanout 解耦，避免发布 tweet 时阻塞在更新大量 timeline 上。

对于读取路径，我会缓存 timeline entries、tweet objects、 user metadata 和 celebrity tweets。

Ranking 可以作为独立层加入，使用新鲜度、互动数据、关系强度和用户兴趣等信号。

这个系统的主要权衡包括新鲜度、延迟、存储成本和一致性。

Timeline delivery 可以最终一致，但是 delete、block、mute 这类操作需要更强正确性。

最终目标是在大规模场景下，以低延迟返回新鲜且相关的 timeline。

⭐ Final Insight

Twitter Timeline 的核心是大规模 fanout 管理：哪些内容提前 push，哪些内容读取时 pull，哪些内容需要 rank。