System Design Deep Dive - 02 Design Twitter Timeline

Post by ailswan April. 26, 2026

中文 ↓

🎯 Design Twitter Timeline


1️⃣ Core Framework

When discussing Twitter Timeline design, I frame it as:

  1. Core user flows: post tweet, follow user, read timeline
  2. Data model: users, tweets, follow graph, timeline entries
  3. Timeline generation strategy: fanout-on-write vs fanout-on-read
  4. Ranking and personalization
  5. Caching and scaling patterns
  6. Trade-offs: freshness vs latency vs storage cost
  7. Failure handling and consistency

2️⃣ Core Requirements


Functional Requirements


Non-functional Requirements


👉 Interview Answer

Twitter Timeline has two core flows: users publish tweets, and followers read timelines.

The read path is extremely latency-sensitive, so we usually precompute or cache timelines.

The main challenge is balancing freshness, latency, storage cost, and handling users with millions of followers.


3️⃣ Main APIs


Post Tweet

POST /api/tweets

Request:

{
  "userId": "u123",
  "content": "Hello world",
  "mediaIds": ["m1", "m2"]
}

Response:

{
  "tweetId": "t789",
  "createdAt": "2026-05-02T10:00:00Z"
}

Follow User

POST /api/follow

Request:

{
  "followerId": "u123",
  "followeeId": "u456"
}

Get Home Timeline

GET /api/timeline/home?userId=u123&cursor=xxx&limit=50

Get Profile Timeline

GET /api/timeline/profile?userId=u456&cursor=xxx&limit=50

👉 Interview Answer

I would separate write APIs, such as posting tweets and following users, from read APIs, such as fetching home timeline and profile timeline.

Home timeline is more complex because it depends on the follow graph, while profile timeline is simpler because it only contains tweets from one user.


4️⃣ Data Model


User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Tweet Table

tweet (
  tweet_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  created_at TIMESTAMP,
  status VARCHAR
)

Follow Table

follow (
  follower_id VARCHAR,
  followee_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (follower_id, followee_id)
)

Reverse Follow Table

reverse_follow (
  followee_id VARCHAR,
  follower_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (followee_id, follower_id)
)

Timeline Table

home_timeline (
  user_id VARCHAR,
  tweet_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  PRIMARY KEY (user_id, created_at, tweet_id)
)

Why Keep Both Follow and Reverse Follow?


👉 Interview Answer

I would store tweets separately from timeline entries.

Tweets are the source of truth, while timeline entries are denormalized views optimized for fast reads.

I would also maintain both follow and reverse-follow tables, because timeline generation needs to quickly find followers of an author.


5️⃣ Timeline Generation Strategy


Option 1: Fanout-on-Write

When a user posts a tweet:

author posts tweet
→ find all followers
→ insert tweet into each follower's home timeline

Pros


Cons


👉 Interview Answer

Fanout-on-write precomputes timeline entries when a tweet is created.

This makes timeline reads very fast, because the system only needs to read from the user’s precomputed timeline.

However, it creates write amplification, especially when a celebrity user posts to millions of followers.


Option 2: Fanout-on-Read

When user opens timeline:

user opens timeline
→ get followees
→ fetch recent tweets from followees
→ merge and rank

Pros


Cons


👉 Interview Answer

Fanout-on-read computes the timeline when the user requests it.

This avoids massive write amplification, but makes reads more expensive because we need to fetch, merge, and rank tweets from many followees.


Option 3: Hybrid Fanout

Recommended approach:

Normal users → fanout-on-write
Celebrity users → fanout-on-read

Why Hybrid?


👉 Interview Answer

In production, I would use a hybrid approach.

For normal users, I use fanout-on-write so their followers can read timelines quickly.

For celebrity users, I use fanout-on-read to avoid pushing one tweet into millions of timelines.

At read time, I merge precomputed timeline entries with recent tweets from celebrity accounts.


Core Insight

Timeline design is mainly about choosing where to pay the cost: at write time or read time.


6️⃣ Home Timeline Read Flow


Basic Flow

  1. User opens home timeline
  2. Timeline service checks cache
  3. Fetch precomputed timeline entries
  4. Fetch missing tweet metadata
  5. Pull recent tweets from celebrity followees
  6. Merge results
  7. Rank or sort
  8. Return timeline page

Chronological Timeline

sort by created_at desc

Simple and predictable.


Ranked Timeline

Signals may include:


👉 Interview Answer

For home timeline reads, I would first fetch precomputed timeline entries from cache or storage.

Then I would merge in tweets from high-follower accounts that are handled by fanout-on-read.

Finally, I would rank or sort the results before returning them to the user.


7️⃣ Ranking and Personalization


Why Ranking?

Pure chronological feed may miss important content.

Ranking improves:


Ranking Pipeline

Candidate Generation
→ Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking

Candidate Sources


Ranking Signals


👉 Interview Answer

Ranking can be added as a separate layer on top of timeline retrieval.

The system first generates candidates, filters out blocked or low-quality content, applies ranking models, and then re-ranks results for diversity, freshness, and safety.

This keeps the retrieval layer scalable while allowing ranking to improve relevance.


8️⃣ Caching Strategy


What to Cache?


Cache Layers


Cache Challenges


👉 Interview Answer

Caching is critical because timeline reads are frequent and latency-sensitive.

I would cache home timeline entries, tweet objects, user metadata, and celebrity recent tweets.

However, cache invalidation is important when tweets are deleted, users block each other, or follow relationships change.


9️⃣ Write Path: Posting a Tweet


Flow

  1. User posts tweet
  2. Tweet service validates request
  3. Store tweet in tweet table
  4. Publish tweet-created event
  5. Fanout service consumes event
  6. Find followers from reverse follow table
  7. Push timeline entries to followers
  8. Update caches asynchronously

Event Pipeline

Tweet Service
→ Kafka / Queue
→ Fanout Workers
→ Timeline Store
→ Cache Update

👉 Interview Answer

I would decouple tweet creation from timeline fanout using an async event pipeline.

The tweet service stores the tweet as the source of truth and publishes a tweet-created event.

Fanout workers consume the event and update followers’ timeline stores asynchronously.

This improves write latency and system resilience.


🔟 Trade-offs


Fanout-on-Write vs Fanout-on-Read

Strategy Pros Cons
Fanout-on-write Fast reads Expensive writes
Fanout-on-read Cheap writes Slower reads
Hybrid Balanced More complex

Freshness vs Latency


Storage Cost vs Read Performance


Consistency vs Availability


👉 Interview Answer

The main trade-off is between write amplification and read latency.

Fanout-on-write makes reads fast, but it is expensive for users with many followers.

Fanout-on-read avoids write amplification, but makes the read path more expensive.

A hybrid approach is usually best for large-scale systems.


1️⃣1️⃣ Scaling Patterns


Pattern 1: Hybrid Fanout


Pattern 2: Async Processing

Use queue-based fanout:

tweet event → queue → fanout workers

Pattern 3: Timeline Precomputation

Store timeline entries per user.


Pattern 4: Cache Hot Data

Cache:


Pattern 5: Shard Timeline Store

Shard by:

user_id

Why?


Pattern 6: Separate Tweet Store and Timeline Store


Pattern 7: Backpressure for Celebrity Tweets

If a celebrity posts:


👉 Interview Answer

To scale Twitter Timeline, I would separate the tweet store from the timeline store, use async fanout workers, cache hot data, and shard timelines by user ID.

I would also use hybrid fanout to avoid massive write amplification from celebrity users.


1️⃣2️⃣ Failure Handling


Common Failures


Strategies


👉 Interview Answer

The system should degrade gracefully.

If ranking fails, we can fall back to chronological order. If fanout is delayed, users may see slightly stale timelines, which is usually acceptable.

Since tweets are stored as the source of truth, we can rebuild timeline entries if needed.


1️⃣3️⃣ Consistency Model


Strong Consistency Needed For


Eventual Consistency Acceptable For


👉 Interview Answer

I would not require strong consistency for home timeline delivery.

It is acceptable if a new tweet appears a few seconds later.

However, stronger consistency is needed for security-sensitive actions such as delete, block, mute, and permission checks.


1️⃣4️⃣ End-to-End Flow


Post Tweet Flow

User posts tweet
→ Tweet Service stores tweet
→ Publish tweet-created event
→ Fanout workers update timelines
→ Timeline cache refreshed

Read Timeline Flow

User opens timeline
→ Timeline Service checks cache
→ Fetch precomputed timeline entries
→ Pull celebrity tweets
→ Merge candidates
→ Rank results
→ Return feed

Key Insight

Twitter Timeline is not just a feed list — it is a large-scale fanout, caching, and ranking system.


🧠 Staff-Level Answer (Final)


👉 Interview Answer (Full Version)

When designing Twitter Timeline, I think of it as two main flows: publishing tweets and reading timelines.

The tweet store is the source of truth, while the home timeline is a denormalized, read-optimized view.

For timeline generation, I would use a hybrid fanout strategy. Normal users use fanout-on-write, where new tweets are pushed into followers’ timelines. Celebrity users use fanout-on-read, where their recent tweets are pulled and merged at read time.

This balances read latency and write amplification.

I would decouple tweet creation from fanout using an asynchronous queue, so posting a tweet does not block on updating millions of timelines.

For reads, I would cache timeline entries, tweet objects, user metadata, and celebrity tweets.

Ranking can be added as a separate layer using signals such as recency, engagement, relationship strength, and user interests.

The main trade-offs are freshness, latency, storage cost, and consistency.

Timeline delivery can be eventually consistent, but actions like delete, block, and mute need stronger correctness.

Ultimately, the goal is to deliver a fresh and relevant timeline with low latency at massive scale.


⭐ Final Insight

Twitter Timeline is mainly about managing fanout at scale — deciding what to push, what to pull, and what to rank.



中文部分


🎯 Design Twitter Timeline


1️⃣ 核心框架

在设计 Twitter Timeline 时,我通常从以下几个方面来分析:

  1. 核心用户流程:发 tweet、关注用户、读取 timeline
  2. 数据模型:用户、tweet、关注关系、timeline entry
  3. Timeline 生成策略:fanout-on-write vs fanout-on-read
  4. 排序和个性化
  5. 缓存和扩展模式
  6. 核心权衡:新鲜度 vs 延迟 vs 存储成本
  7. 故障处理和一致性

2️⃣ 核心需求


功能需求


非功能需求


👉 面试回答

Twitter Timeline 有两个核心流程: 用户发布 tweet, 以及 follower 读取 timeline。

Timeline 读取路径对延迟非常敏感, 所以通常需要预计算或缓存 timeline。

主要挑战是如何在新鲜度、延迟、存储成本之间做平衡, 同时处理拥有大量粉丝的 celebrity 用户。


3️⃣ 主要 API


发布 Tweet

POST /api/tweets

Request:

{
  "userId": "u123",
  "content": "Hello world",
  "mediaIds": ["m1", "m2"]
}

Response:

{
  "tweetId": "t789",
  "createdAt": "2026-05-02T10:00:00Z"
}

关注用户

POST /api/follow

Request:

{
  "followerId": "u123",
  "followeeId": "u456"
}

获取 Home Timeline

GET /api/timeline/home?userId=u123&cursor=xxx&limit=50

获取 Profile Timeline

GET /api/timeline/profile?userId=u456&cursor=xxx&limit=50

👉 面试回答

我会将写 API 和读 API 分开设计。

写 API 包括发布 tweet 和关注用户; 读 API 包括获取 home timeline 和 profile timeline。

Home timeline 更复杂, 因为它依赖用户的 follow graph; Profile timeline 更简单, 因为它只包含某一个用户发布的内容。


4️⃣ 数据模型


User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Tweet Table

tweet (
  tweet_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  created_at TIMESTAMP,
  status VARCHAR
)

Follow Table

follow (
  follower_id VARCHAR,
  followee_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (follower_id, followee_id)
)

Reverse Follow Table

reverse_follow (
  followee_id VARCHAR,
  follower_id VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (followee_id, follower_id)
)

Timeline Table

home_timeline (
  user_id VARCHAR,
  tweet_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  PRIMARY KEY (user_id, created_at, tweet_id)
)

为什么需要 Follow 和 Reverse Follow 两张表?


👉 面试回答

我会将 tweet 本身和 timeline entry 分开存储。

Tweet 表是 source of truth, 而 timeline entry 是为读取优化的反规范化视图。

我也会同时维护 follow 和 reverse-follow 两张表, 因为 timeline fanout 需要快速找到某个作者的所有 followers。


5️⃣ Timeline 生成策略


方案 1:Fanout-on-Write

当用户发布 tweet 时:

author posts tweet
→ find all followers
→ insert tweet into each follower's home timeline

优点


缺点


👉 面试回答

Fanout-on-write 是在 tweet 创建时, 就将这个 tweet 推送到所有 follower 的 timeline 中。

这样读取 timeline 时非常快, 因为系统只需要读取用户已经预计算好的 timeline。

但是这种方式会造成严重写放大, 特别是当一个 celebrity 用户拥有数百万粉丝时。


方案 2:Fanout-on-Read

当用户打开 timeline 时:

user opens timeline
→ get followees
→ fetch recent tweets from followees
→ merge and rank

优点


缺点


👉 面试回答

Fanout-on-read 是在用户读取 timeline 时动态计算 timeline。

这种方式避免了写入时的大规模 fanout, 但是会让读取路径变重, 因为需要从多个 followee 拉取 tweet, 再做合并和排序。


方案 3:Hybrid Fanout

推荐方案:

Normal users → fanout-on-write
Celebrity users → fanout-on-read

为什么用 Hybrid?


👉 面试回答

在生产系统中,我会采用 hybrid fanout。

对普通用户使用 fanout-on-write, 这样他们的 follower 可以快速读取 timeline。

对 celebrity 用户使用 fanout-on-read, 避免将一条 tweet 推送到数百万个 timeline 中。

在读取时,再将预计算 timeline 和 celebrity 用户的最近 tweets 合并。


核心理解

Timeline 设计本质是在决定: 成本是在写入时支付,还是在读取时支付。


6️⃣ Home Timeline 读取流程


基本流程

  1. 用户打开 home timeline
  2. Timeline service 查询 cache
  3. 获取预计算 timeline entries
  4. 获取缺失的 tweet metadata
  5. 拉取 celebrity followees 的最新 tweets
  6. 合并结果
  7. 排序或 rank
  8. 返回 timeline page

按时间排序 Timeline

sort by created_at desc

简单、可预测。


Ranked Timeline

常见信号包括:


👉 面试回答

对于 home timeline 读取, 我会先从 cache 或 timeline store 获取预计算 timeline entries。

然后合并来自 celebrity followees 的最近 tweets, 因为这些 tweet 通常不会提前 fanout。

最后再对结果进行排序或 rank, 然后返回给用户。


7️⃣ 排序和个性化


为什么需要 Ranking?

纯时间排序可能会错过重要内容。

Ranking 可以提升:


Ranking Pipeline

Candidate Generation
→ Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking

Candidate Sources


Ranking Signals


👉 面试回答

Ranking 可以作为 timeline retrieval 之后的独立层。

系统先生成候选内容, 然后过滤掉 blocked 或低质量内容, 再通过 ranking model 排序, 最后根据多样性、新鲜度和安全性进行 re-ranking。

这样可以保持 retrieval 层可扩展, 同时通过 ranking 提高内容相关性。


8️⃣ 缓存策略


缓存什么?


缓存层


缓存挑战


👉 面试回答

缓存非常关键, 因为 timeline 读取频繁且对延迟敏感。

我会缓存 home timeline entries、tweet objects、 user metadata 和 celebrity recent tweets。

但是需要谨慎处理缓存失效, 特别是 tweet 删除、block 关系变化、 follow 关系变化这些场景。


9️⃣ 写路径:发布 Tweet


流程

  1. 用户发布 tweet
  2. Tweet service 校验请求
  3. 将 tweet 存入 tweet table
  4. 发布 tweet-created event
  5. Fanout service 消费事件
  6. 从 reverse follow table 找到 followers
  7. 将 timeline entries 推送到 followers
  8. 异步更新 cache

Event Pipeline

Tweet Service
→ Kafka / Queue
→ Fanout Workers
→ Timeline Store
→ Cache Update

👉 面试回答

我会使用异步事件管道将 tweet 创建和 timeline fanout 解耦。

Tweet service 会先将 tweet 作为 source of truth 存储下来, 然后发布 tweet-created event。

Fanout workers 消费这个事件, 并异步更新 followers 的 timeline store。

这样可以降低发 tweet 的写入延迟, 同时提高系统稳定性。


🔟 核心权衡


Fanout-on-Write vs Fanout-on-Read

Strategy 优点 缺点
Fanout-on-write 读取快 写入昂贵
Fanout-on-read 写入便宜 读取慢
Hybrid 折中 系统更复杂

新鲜度 vs 延迟


存储成本 vs 读取性能


一致性 vs 可用性


👉 面试回答

这里最核心的权衡是写放大和读取延迟。

Fanout-on-write 可以让读取非常快, 但是对于粉丝很多的用户写入成本非常高。

Fanout-on-read 可以避免写放大, 但是会让读取路径更昂贵。

对大规模系统来说,hybrid fanout 通常是更好的选择。


1️⃣1️⃣ 扩展模式


Pattern 1: Hybrid Fanout


Pattern 2: Async Processing

使用 queue-based fanout:

tweet event → queue → fanout workers

Pattern 3: Timeline Precomputation

为每个用户预计算 timeline entries。


Pattern 4: Cache Hot Data

缓存:


Pattern 5: Shard Timeline Store

按以下字段分片:

user_id

原因:


Pattern 6: Separate Tweet Store and Timeline Store


Pattern 7: Backpressure for Celebrity Tweets

当 celebrity 用户发 tweet 时:


👉 面试回答

为了扩展 Twitter Timeline, 我会将 tweet store 和 timeline store 分开, 使用异步 fanout workers, 缓存热点数据, 并按照 user_id 对 timeline store 做分片。

同时,我会使用 hybrid fanout 来避免 celebrity 用户造成巨大的写放大。


1️⃣2️⃣ 故障处理


常见故障


处理策略


👉 面试回答

系统需要优雅降级。

如果 ranking service 失败, 可以回退到按时间排序。

如果 fanout 延迟, 用户可能看到稍微旧一点的 timeline, 这通常是可以接受的。

因为 tweet store 是 source of truth, 所以 timeline entries 在必要时可以重建。


1️⃣3️⃣ 一致性模型


需要较强一致性的场景


可以最终一致的场景


👉 面试回答

我不会要求 home timeline delivery 强一致。

新 tweet 晚几秒出现在 timeline 中通常是可以接受的。

但是对于 delete、block、mute 和权限检查这类安全敏感操作, 需要更强的正确性保证。


1️⃣4️⃣ End-to-End Flow


发布 Tweet 流程

User posts tweet
→ Tweet Service stores tweet
→ Publish tweet-created event
→ Fanout workers update timelines
→ Timeline cache refreshed

读取 Timeline 流程

User opens timeline
→ Timeline Service checks cache
→ Fetch precomputed timeline entries
→ Pull celebrity tweets
→ Merge candidates
→ Rank results
→ Return feed

Key Insight

Twitter Timeline 不只是一个 feed list, 它是一个大规模 fanout、缓存和排序系统。


🧠 Staff-Level Answer(最终版)


👉 面试回答(完整背诵版)

在设计 Twitter Timeline 时, 我会将系统拆成两个主要流程: 发布 tweet 和读取 timeline。

Tweet store 是 source of truth, 而 home timeline 是一个为了快速读取而设计的反规范化视图。

对于 timeline 生成, 我会使用 hybrid fanout 策略。 普通用户使用 fanout-on-write, 在 tweet 创建时将内容推送到 followers 的 timeline 中。 Celebrity 用户使用 fanout-on-read, 在读取 timeline 时再拉取并合并他们的最新 tweets。

这样可以平衡读取延迟和写放大。

我会通过异步 queue 将 tweet 创建和 fanout 解耦, 避免发布 tweet 时阻塞在更新大量 timeline 上。

对于读取路径, 我会缓存 timeline entries、tweet objects、 user metadata 和 celebrity tweets。

Ranking 可以作为独立层加入, 使用新鲜度、互动数据、关系强度和用户兴趣等信号。

这个系统的主要权衡包括新鲜度、延迟、存储成本和一致性。

Timeline delivery 可以最终一致, 但是 delete、block、mute 这类操作需要更强正确性。

最终目标是在大规模场景下, 以低延迟返回新鲜且相关的 timeline。


⭐ Final Insight

Twitter Timeline 的核心是大规模 fanout 管理: 哪些内容提前 push,哪些内容读取时 pull,哪些内容需要 rank。

Implement