System Design Deep Dive - 03 Design News Feed

Post by ailswan April. 27, 2026

中文 ↓

🎯 Design News Feed


1️⃣ Core Framework

When discussing News Feed design, I frame it as:

  1. Core user flows: create post, follow/friend, read feed
  2. Data model: users, posts, relationships, feed entries
  3. Feed generation strategy: push vs pull vs hybrid
  4. Candidate generation from multiple sources
  5. Ranking, personalization, and filtering
  6. Caching and scaling patterns
  7. Trade-offs: freshness vs relevance vs latency
  8. Failure handling, privacy, and consistency

2️⃣ Core Requirements


Functional Requirements


Non-functional Requirements


👉 Interview Answer

A news feed system needs to support post creation, relationship management, and personalized feed reading.

The feed read path is latency-sensitive, but the ranking logic can be complex.

The key challenge is balancing freshness, relevance, latency, storage cost, and privacy enforcement.


3️⃣ Main APIs


Create Post

POST /api/posts

Request:

{
  "userId": "u123",
  "content": "My new post",
  "mediaIds": ["m1"],
  "visibility": "friends"
}

Response:

{
  "postId": "p789",
  "createdAt": "2026-05-02T10:00:00Z"
}

Follow / Friend User

POST /api/relationships

Request:

{
  "sourceUserId": "u123",
  "targetUserId": "u456",
  "type": "follow"
}

Get News Feed

GET /api/feed?userId=u123&cursor=xxx&limit=50

Interact With Post

POST /api/posts/{postId}/actions

Request:

{
  "userId": "u123",
  "action": "like"
}

👉 Interview Answer

I would separate post creation, relationship updates, feed reads, and engagement actions into different APIs.

Feed reads are the critical path, while likes, comments, and analytics can often be processed asynchronously.


4️⃣ Data Model


User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Post Table

post (
  post_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  visibility VARCHAR,
  created_at TIMESTAMP,
  status VARCHAR
)

Relationship Table

relationship (
  source_user_id VARCHAR,
  target_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (source_user_id, target_user_id)
)

Reverse Relationship Table

reverse_relationship (
  target_user_id VARCHAR,
  source_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (target_user_id, source_user_id)
)

Feed Entry Table

feed_entry (
  user_id VARCHAR,
  post_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  source VARCHAR,
  PRIMARY KEY (user_id, created_at, post_id)
)

Engagement Table

post_engagement (
  post_id VARCHAR,
  user_id VARCHAR,
  action VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (post_id, user_id, action)
)

Why Separate Post and Feed Entry?


👉 Interview Answer

I would store posts as the source of truth and store feed entries separately as a denormalized, read-optimized view.

This allows the feed service to read quickly without joining posts, relationships, ranking signals, and engagement data on every request.


5️⃣ Feed Generation Strategy


Option 1: Push Model / Fanout-on-Write

When a user creates a post:

author creates post
→ find followers/friends
→ insert post into each follower's feed

Pros


Cons


👉 Interview Answer

In the push model, the system precomputes feed entries when a post is created.

This makes feed reads very fast, but it creates write amplification, especially for users with a large number of followers.


Option 2: Pull Model / Fanout-on-Read

When user opens feed:

user opens feed
→ get followed users / friends
→ fetch recent posts
→ merge, filter, rank

Pros


Cons


👉 Interview Answer

In the pull model, the system builds the feed when the user requests it.

This avoids large fanout writes, but makes feed reads more expensive because we need to fetch, merge, filter, and rank many posts.


Option 3: Hybrid Model

Recommended approach:

Normal users → push model
High-follower users / recommended content → pull model

Why Hybrid?


👉 Interview Answer

In production, I would use a hybrid model.

For normal users, I would push posts into followers’ feeds. For high-follower users and recommendation sources, I would pull content at read time and merge it into the feed.

This balances low-latency reads with manageable write cost.


Core Insight

News Feed design is about deciding what to precompute, what to fetch live, and what to rank dynamically.


6️⃣ Candidate Generation


Candidate Sources

A modern news feed is not only from followees.

It may include:


Candidate Pipeline

Social Graph Candidates
→ Recommendation Candidates
→ Ads Candidates
→ Filtering
→ Ranking
→ Feed Assembly

Why Candidate Generation Matters


👉 Interview Answer

Candidate generation collects possible feed items from multiple sources such as friends, followees, groups, recommendations, and ads.

The goal is high recall, while later ranking stages decide which items should be shown first.


7️⃣ Ranking and Personalization


Why Ranking?

Chronological feeds are simple, but they may not show the most relevant content.

Ranking optimizes:


Ranking Signals


Ranking Pipeline

Candidate Generation
→ Eligibility Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Feed Assembly

Re-ranking Goals


👉 Interview Answer

Ranking is usually a multi-stage pipeline.

First, the system generates candidates. Then it filters out ineligible content, applies lightweight ranking, runs more expensive ML models, and finally re-ranks results for diversity, safety, and freshness.

This allows the system to balance relevance and performance.


8️⃣ Privacy and Filtering


Why Privacy Matters

News Feed must enforce visibility rules correctly.

Examples:


Filtering Layers


Even if fanout happens at write time, still check privacy at read time.

Why?


👉 Interview Answer

Privacy filtering must be enforced carefully.

Even if we precompute feed entries, I would still perform read-time privacy checks before returning posts.

This prevents stale feed entries from exposing deleted, blocked, or visibility-restricted content.


9️⃣ Caching Strategy


What to Cache?


Cache Layers


Cache Challenges


👉 Interview Answer

Caching is essential because feed reads are frequent and latency-sensitive.

I would cache feed entries, post objects, user metadata, relationship data, and ranking features.

However, the system must carefully handle cache invalidation for deleted posts, privacy updates, and relationship changes.


🔟 Write Path: Creating a Post


Flow

  1. User creates post
  2. Post service validates request
  3. Store post as source of truth
  4. Publish post-created event
  5. Fanout workers consume event
  6. Find eligible followers/friends
  7. Create feed entries
  8. Update caches asynchronously

Event Pipeline

Post Service
→ Kafka / Queue
→ Fanout Workers
→ Feed Store
→ Cache Update

👉 Interview Answer

I would decouple post creation from feed fanout using an async event pipeline.

The post service stores the post as the source of truth and publishes a post-created event.

Fanout workers then update feed stores asynchronously, which keeps post creation fast and resilient.


1️⃣1️⃣ Trade-offs


Push vs Pull

Strategy Pros Cons
Push Fast reads Write amplification
Pull Lower write cost Slower reads
Hybrid Balanced More complexity

Freshness vs Latency


Relevance vs Explainability


Storage Cost vs Read Performance


Privacy Correctness vs Performance


👉 Interview Answer

The main trade-offs are freshness, relevance, latency, storage cost, and privacy correctness.

Push-based feed improves read latency but increases write cost. Pull-based feed reduces write cost but makes reads heavier.

In practice, hybrid feed generation is usually the best approach.


1️⃣2️⃣ Scaling Patterns


Pattern 1: Hybrid Feed Generation


Pattern 2: Async Fanout

Use queue-based processing:

post-created event → queue → fanout workers

Pattern 3: Separate Source of Truth and Feed View


Pattern 4: Feature Store for Ranking

Store precomputed ranking features:


Pattern 5: Shard Feed Store by User ID

Why?


Pattern 6: Backpressure for High-Fanout Authors

For high-follower authors:


Pattern 7: Feed Assembly Service

Feed response may include:


👉 Interview Answer

To scale News Feed, I would separate the post store from the feed store, use async fanout workers, shard feeds by user ID, cache hot data, and use a feature store for ranking.

For high-fanout authors and recommendation content, I would use pull-based retrieval at read time.


1️⃣3️⃣ Failure Handling


Common Failures


Strategies


👉 Interview Answer

The system should degrade gracefully.

If ranking fails, we can fall back to chronological ordering. If recommendations fail, we can still show social posts. If fanout is delayed, users may see a slightly stale feed, which is usually acceptable.

The post store remains the source of truth, so feed entries can be rebuilt when needed.


1️⃣4️⃣ Consistency Model


Stronger Consistency Needed For


Eventual Consistency Acceptable For


👉 Interview Answer

News Feed does not usually require strong consistency for delivery.

It is acceptable if a new post appears a few seconds later.

However, delete, block, mute, and privacy changes require stronger correctness, because stale data can create privacy or safety issues.


1️⃣5️⃣ End-to-End Flow


Create Post Flow

User creates post
→ Post Service stores post
→ Publish post-created event
→ Fanout workers create feed entries
→ Cache updated asynchronously

Read Feed Flow

User opens feed
→ Feed Service checks cache
→ Fetch precomputed feed entries
→ Pull recommendations / high-fanout posts
→ Apply privacy filtering
→ Rank candidates
→ Assemble final feed
→ Return response

Key Insight

News Feed is a personalized content delivery system, not just a list of recent posts.


🧠 Staff-Level Answer (Final)


👉 Interview Answer (Full Version)

When designing a News Feed, I think of it as a personalized content delivery pipeline.

The system has two main flows: creating posts and reading feeds.

Posts are stored as the source of truth, while feed entries are denormalized views optimized for low-latency reads.

For feed generation, I would use a hybrid push-pull model. Normal social posts can be pushed into followers’ feeds, while high-follower authors, recommendations, and ads can be pulled at read time.

This balances read latency, write amplification, and storage cost.

I would use an asynchronous event pipeline for fanout, so post creation does not block on updating many feeds.

For reads, the feed service would fetch precomputed entries, pull additional candidates, apply privacy filtering, rank results using personalization signals, and assemble the final feed.

The main trade-offs are freshness, relevance, latency, storage cost, and privacy correctness.

Feed delivery can be eventually consistent, but delete, block, mute, and privacy rules need stronger correctness.

Ultimately, the goal is to deliver a fresh, relevant, and safe feed with low latency at large scale.


⭐ Final Insight

News Feed design is about combining social graph, ranking, caching, and privacy into one low-latency personalized delivery system.



中文部分


🎯 Design News Feed


1️⃣ 核心框架

在设计 News Feed 时,我通常从以下几个方面来分析:

  1. 核心用户流程:创建 post、关注/好友关系、读取 feed
  2. 数据模型:用户、post、关系、feed entry
  3. Feed 生成策略:push vs pull vs hybrid
  4. 多来源 candidate generation
  5. 排序、个性化和过滤
  6. 缓存和扩展模式
  7. 核心权衡:新鲜度 vs 相关性 vs 延迟
  8. 故障处理、隐私和一致性

2️⃣ 核心需求


功能需求


非功能需求


👉 面试回答

News Feed 系统需要支持 post 创建、 用户关系管理以及个性化 feed 读取。

Feed 读取路径对延迟非常敏感, 但 ranking 逻辑可能非常复杂。

核心挑战是在新鲜度、相关性、延迟、 存储成本和隐私保护之间做平衡。


3️⃣ 主要 API


创建 Post

POST /api/posts

Request:

{
  "userId": "u123",
  "content": "My new post",
  "mediaIds": ["m1"],
  "visibility": "friends"
}

Response:

{
  "postId": "p789",
  "createdAt": "2026-05-02T10:00:00Z"
}

Follow / Friend User

POST /api/relationships

Request:

{
  "sourceUserId": "u123",
  "targetUserId": "u456",
  "type": "follow"
}

获取 News Feed

GET /api/feed?userId=u123&cursor=xxx&limit=50

与 Post 互动

POST /api/posts/{postId}/actions

Request:

{
  "userId": "u123",
  "action": "like"
}

👉 面试回答

我会将 post 创建、关系更新、feed 读取和互动行为拆成不同 API。

Feed 读取是核心路径, 而 likes、comments 和 analytics 通常可以异步处理。


4️⃣ 数据模型


User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Post Table

post (
  post_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  visibility VARCHAR,
  created_at TIMESTAMP,
  status VARCHAR
)

Relationship Table

relationship (
  source_user_id VARCHAR,
  target_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (source_user_id, target_user_id)
)

Reverse Relationship Table

reverse_relationship (
  target_user_id VARCHAR,
  source_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (target_user_id, source_user_id)
)

Feed Entry Table

feed_entry (
  user_id VARCHAR,
  post_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  source VARCHAR,
  PRIMARY KEY (user_id, created_at, post_id)
)

Engagement Table

post_engagement (
  post_id VARCHAR,
  user_id VARCHAR,
  action VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (post_id, user_id, action)
)

为什么 Post 和 Feed Entry 要分开?


👉 面试回答

我会将 post 作为 source of truth 存储, 并将 feed entry 作为反规范化的读优化视图单独存储。

这样 feed service 在读取时不需要每次 join post、 relationship、ranking signals 和 engagement data, 从而保证低延迟。


5️⃣ Feed 生成策略


方案 1:Push Model / Fanout-on-Write

当用户创建 post 时:

author creates post
→ find followers/friends
→ insert post into each follower's feed

优点


缺点


👉 面试回答

在 push model 中, 系统会在 post 创建时提前生成 feed entries。

这样 feed 读取非常快, 但会产生写放大, 特别是当作者有大量 followers 时。


方案 2:Pull Model / Fanout-on-Read

当用户打开 feed 时:

user opens feed
→ get followed users / friends
→ fetch recent posts
→ merge, filter, rank

优点


缺点


👉 面试回答

在 pull model 中, 系统会在用户请求 feed 时动态生成 feed。

这样避免了大规模写入 fanout, 但是读取路径会更重, 因为需要拉取、合并、过滤并排序大量 post。


方案 3:Hybrid Model

推荐方案:

Normal users → push model
High-follower users / recommended content → pull model

为什么用 Hybrid?


👉 面试回答

在生产系统中,我会使用 hybrid model。

普通用户的 post 可以提前 push 到 followers 的 feed 中。 对于 high-follower 用户和推荐内容, 则在用户读取 feed 时再 pull 并合并进去。

这样可以平衡低延迟读取和可控的写入成本。


核心理解

News Feed 设计是在决定: 哪些内容提前预计算,哪些内容实时拉取,哪些内容动态排序。


6️⃣ Candidate Generation


Candidate Sources

现代 News Feed 不只是来自关注用户。

它可能包括:


Candidate Pipeline

Social Graph Candidates
→ Recommendation Candidates
→ Ads Candidates
→ Filtering
→ Ranking
→ Feed Assembly

为什么 Candidate Generation 重要?


👉 面试回答

Candidate generation 会从多个来源收集可能展示的 feed items, 包括好友、关注用户、群组、推荐内容和广告。

它的目标是高召回, 后续 ranking 阶段再决定哪些内容应该排在前面。


7️⃣ 排序和个性化


为什么需要 Ranking?

纯时间排序简单, 但不一定能展示最相关的内容。

Ranking 可以优化:


Ranking Signals


Ranking Pipeline

Candidate Generation
→ Eligibility Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Feed Assembly

Re-ranking 目标


👉 面试回答

Ranking 通常是一个多阶段 pipeline。

系统先生成 candidates, 然后过滤掉不符合条件的内容, 再进行轻量排序和更复杂的 ML 排序, 最后根据多样性、安全性和新鲜度做 re-ranking。

这样可以在相关性和性能之间取得平衡。


8️⃣ 隐私和过滤


为什么隐私重要?

News Feed 必须正确执行 visibility rules。

例如:


Filtering Layers


推荐策略

即使已经在写入时 fanout, 读取时仍然要重新检查 privacy。

原因:


👉 面试回答

隐私过滤必须非常谨慎。

即使系统提前预计算了 feed entries, 我也会在读取时再次进行 privacy check。

这样可以避免旧的 feed entries 暴露已经删除、 blocked 或 visibility 受限的内容。


9️⃣ 缓存策略


缓存什么?


缓存层


缓存挑战


👉 面试回答

缓存非常重要, 因为 feed 读取频繁并且对延迟敏感。

我会缓存 feed entries、post objects、 user metadata、relationship data 和 ranking features。

但是系统需要谨慎处理缓存失效, 特别是 post 删除、privacy 更新和关系变化。


🔟 写路径:创建 Post


流程

  1. 用户创建 post
  2. Post service 校验请求
  3. 将 post 存为 source of truth
  4. 发布 post-created event
  5. Fanout workers 消费事件
  6. 找到 eligible followers/friends
  7. 创建 feed entries
  8. 异步更新 cache

Event Pipeline

Post Service
→ Kafka / Queue
→ Fanout Workers
→ Feed Store
→ Cache Update

👉 面试回答

我会使用异步事件管道将 post 创建和 feed fanout 解耦。

Post service 会先将 post 作为 source of truth 存储下来, 然后发布 post-created event。

Fanout workers 再异步更新 feed stores, 这样可以让 post 创建保持快速且稳定。


1️⃣1️⃣ 核心权衡


Push vs Pull

Strategy 优点 缺点
Push 读取快 写放大
Pull 写入成本低 读取慢
Hybrid 折中 系统更复杂

新鲜度 vs 延迟


相关性 vs 可解释性


存储成本 vs 读取性能


隐私正确性 vs 性能


👉 面试回答

主要权衡包括新鲜度、相关性、延迟、 存储成本和隐私正确性。

Push-based feed 可以提升读取性能, 但会增加写入成本。

Pull-based feed 降低写入成本, 但会让读取路径更重。

实际系统通常使用 hybrid feed generation。


1️⃣2️⃣ 扩展模式


Pattern 1: Hybrid Feed Generation


Pattern 2: Async Fanout

使用 queue-based processing:

post-created event → queue → fanout workers

Pattern 3: Separate Source of Truth and Feed View


Pattern 4: Feature Store for Ranking

存储预计算 ranking features:


Pattern 5: Shard Feed Store by User ID

原因:


Pattern 6: Backpressure for High-Fanout Authors

对于高粉丝作者:


Pattern 7: Feed Assembly Service

Feed response 可以包含:


👉 面试回答

为了扩展 News Feed, 我会将 post store 和 feed store 分开, 使用异步 fanout workers, 按 user ID 对 feed store 分片, 缓存热点数据, 并使用 feature store 支持 ranking。

对于 high-fanout authors 和 recommendation content, 我会在读取时使用 pull-based retrieval。


1️⃣3️⃣ 故障处理


常见故障


处理策略


👉 面试回答

系统应该优雅降级。

如果 ranking 失败, 可以回退到按时间排序。 如果 recommendation 失败, 仍然可以展示社交内容。 如果 fanout 延迟, 用户可能看到稍微旧一点的 feed, 通常是可以接受的。

Post store 是 source of truth, 因此 feed entries 可以在需要时重建。


1️⃣4️⃣ 一致性模型


需要较强一致性的场景


可以最终一致的场景


👉 面试回答

News Feed 通常不要求 delivery 强一致。

新 post 晚几秒出现在 feed 里是可以接受的。

但是 delete、block、mute 和 privacy changes 需要更强正确性, 因为旧数据可能造成隐私或安全问题。


1️⃣5️⃣ End-to-End Flow


Create Post Flow

User creates post
→ Post Service stores post
→ Publish post-created event
→ Fanout workers create feed entries
→ Cache updated asynchronously

Read Feed Flow

User opens feed
→ Feed Service checks cache
→ Fetch precomputed feed entries
→ Pull recommendations / high-fanout posts
→ Apply privacy filtering
→ Rank candidates
→ Assemble final feed
→ Return response

Key Insight

News Feed 是一个个性化内容分发系统, 不只是按时间排列的 post list。


🧠 Staff-Level Answer(最终版)


👉 面试回答(完整背诵版)

在设计 News Feed 时, 我会将它看作一个个性化内容分发 pipeline。

系统有两个主要流程: 创建 post 和读取 feed。

Post 作为 source of truth 存储, feed entries 则是为了低延迟读取而设计的反规范化视图。

对于 feed generation, 我会使用 hybrid push-pull model。 普通社交 post 可以提前 push 到 followers 的 feed 中, 而 high-follower authors、recommendations 和 ads 可以在读取时 pull 进来。

这样可以平衡读取延迟、写放大和存储成本。

我会使用异步事件管道处理 fanout, 避免 post 创建阻塞在大量 feed 更新上。

在读取路径中, feed service 会获取预计算 feed entries, 拉取额外 candidates, 应用 privacy filtering, 使用个性化信号进行 ranking, 最后组装最终 feed。

主要权衡包括新鲜度、相关性、延迟、 存储成本和隐私正确性。

Feed delivery 可以最终一致, 但 delete、block、mute 和 privacy rules 需要更强正确性。

最终目标是在大规模场景下, 低延迟地返回新鲜、相关且安全的 feed。


⭐ Final Insight

News Feed 的核心是把社交图谱、排序、缓存和隐私规则 组合成一个低延迟的个性化内容分发系统。

Implement