🎯 Design News Feed
1️⃣ Core Framework
When discussing News Feed design, I frame it as:
- Core user flows: create post, follow/friend, read feed
- Data model: users, posts, relationships, feed entries
- Feed generation strategy: push vs pull vs hybrid
- Candidate generation from multiple sources
- Ranking, personalization, and filtering
- Caching and scaling patterns
- Trade-offs: freshness vs relevance vs latency
- Failure handling, privacy, and consistency
2️⃣ Core Requirements
Functional Requirements
- User can create posts
- User can follow or friend other users
- User can view personalized news feed
- User can like, comment, share, hide, or report posts
- Support media posts
- Support ranking and recommendations
- Support privacy rules
Non-functional Requirements
- Low-latency feed reads
- High availability
- High write throughput
- Near-real-time freshness
- Personalized ranking
- Strong privacy enforcement
- Eventually consistent feed delivery is acceptable
👉 Interview Answer
A news feed system needs to support post creation, relationship management, and personalized feed reading.
The feed read path is latency-sensitive, but the ranking logic can be complex.
The key challenge is balancing freshness, relevance, latency, storage cost, and privacy enforcement.
3️⃣ Main APIs
Create Post
POST /api/posts
Request:
{
"userId": "u123",
"content": "My new post",
"mediaIds": ["m1"],
"visibility": "friends"
}
Response:
{
"postId": "p789",
"createdAt": "2026-05-02T10:00:00Z"
}
Follow / Friend User
POST /api/relationships
Request:
{
"sourceUserId": "u123",
"targetUserId": "u456",
"type": "follow"
}
Get News Feed
GET /api/feed?userId=u123&cursor=xxx&limit=50
Interact With Post
POST /api/posts/{postId}/actions
Request:
{
"userId": "u123",
"action": "like"
}
👉 Interview Answer
I would separate post creation, relationship updates, feed reads, and engagement actions into different APIs.
Feed reads are the critical path, while likes, comments, and analytics can often be processed asynchronously.
4️⃣ Data Model
User Table
user (
user_id VARCHAR PRIMARY KEY,
username VARCHAR,
created_at TIMESTAMP
)
Post Table
post (
post_id VARCHAR PRIMARY KEY,
author_id VARCHAR,
content TEXT,
media_ids ARRAY,
visibility VARCHAR,
created_at TIMESTAMP,
status VARCHAR
)
Relationship Table
relationship (
source_user_id VARCHAR,
target_user_id VARCHAR,
type VARCHAR,
created_at TIMESTAMP,
PRIMARY KEY (source_user_id, target_user_id)
)
Reverse Relationship Table
reverse_relationship (
target_user_id VARCHAR,
source_user_id VARCHAR,
type VARCHAR,
created_at TIMESTAMP,
PRIMARY KEY (target_user_id, source_user_id)
)
Feed Entry Table
feed_entry (
user_id VARCHAR,
post_id VARCHAR,
author_id VARCHAR,
created_at TIMESTAMP,
score DOUBLE,
source VARCHAR,
PRIMARY KEY (user_id, created_at, post_id)
)
Engagement Table
post_engagement (
post_id VARCHAR,
user_id VARCHAR,
action VARCHAR,
created_at TIMESTAMP,
PRIMARY KEY (post_id, user_id, action)
)
Why Separate Post and Feed Entry?
- Post table is the source of truth
- Feed entry table is a read-optimized view
- Feed can contain posts from multiple sources
- Feed entries can be rebuilt if needed
👉 Interview Answer
I would store posts as the source of truth and store feed entries separately as a denormalized, read-optimized view.
This allows the feed service to read quickly without joining posts, relationships, ranking signals, and engagement data on every request.
5️⃣ Feed Generation Strategy
Option 1: Push Model / Fanout-on-Write
When a user creates a post:
author creates post
→ find followers/friends
→ insert post into each follower's feed
Pros
- Feed reads are very fast
- Good for normal users
- Simple read path
Cons
- High write amplification
- Expensive for users with many followers
- Privacy changes are harder to apply retroactively
👉 Interview Answer
In the push model, the system precomputes feed entries when a post is created.
This makes feed reads very fast, but it creates write amplification, especially for users with a large number of followers.
Option 2: Pull Model / Fanout-on-Read
When user opens feed:
user opens feed
→ get followed users / friends
→ fetch recent posts
→ merge, filter, rank
Pros
- Lower write cost
- Easier to apply latest privacy and ranking rules
- Better for celebrity or high-fanout accounts
Cons
- Feed reads are slower
- Expensive for users following many people
- Requires merging and ranking at read time
👉 Interview Answer
In the pull model, the system builds the feed when the user requests it.
This avoids large fanout writes, but makes feed reads more expensive because we need to fetch, merge, filter, and rank many posts.
Option 3: Hybrid Model
Recommended approach:
Normal users → push model
High-follower users / recommended content → pull model
Why Hybrid?
- Most users have relatively small audiences
- A few users create huge fanout pressure
- Feed can include both social posts and recommendations
- Hybrid balances read latency and write amplification
👉 Interview Answer
In production, I would use a hybrid model.
For normal users, I would push posts into followers’ feeds. For high-follower users and recommendation sources, I would pull content at read time and merge it into the feed.
This balances low-latency reads with manageable write cost.
Core Insight
News Feed design is about deciding what to precompute, what to fetch live, and what to rank dynamically.
6️⃣ Candidate Generation
Candidate Sources
A modern news feed is not only from followees.
It may include:
- Posts from friends / followees
- Reposts or shared posts
- Popular posts in user’s network
- Group/community posts
- Recommended content
- Ads
- Trending topics
Candidate Pipeline
Social Graph Candidates
→ Recommendation Candidates
→ Ads Candidates
→ Filtering
→ Ranking
→ Feed Assembly
Why Candidate Generation Matters
- Ranking cannot rank what retrieval did not fetch
- Candidate generation controls recall
- Different sources have different freshness and quality
👉 Interview Answer
Candidate generation collects possible feed items from multiple sources such as friends, followees, groups, recommendations, and ads.
The goal is high recall, while later ranking stages decide which items should be shown first.
7️⃣ Ranking and Personalization
Why Ranking?
Chronological feeds are simple, but they may not show the most relevant content.
Ranking optimizes:
- Relevance
- Engagement
- Freshness
- Diversity
- Safety
- Business goals
Ranking Signals
- Recency
- Relationship strength
- User interests
- Engagement probability
- Content type preference
- Negative feedback
- Author quality
- Post quality
- Content safety
- Diversity constraints
Ranking Pipeline
Candidate Generation
→ Eligibility Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Feed Assembly
Re-ranking Goals
- Avoid too many posts from same author
- Avoid too many ads
- Mix fresh and high-quality posts
- Apply safety and policy rules
- Improve diversity
👉 Interview Answer
Ranking is usually a multi-stage pipeline.
First, the system generates candidates. Then it filters out ineligible content, applies lightweight ranking, runs more expensive ML models, and finally re-ranks results for diversity, safety, and freshness.
This allows the system to balance relevance and performance.
8️⃣ Privacy and Filtering
Why Privacy Matters
News Feed must enforce visibility rules correctly.
Examples:
- Public post
- Friends-only post
- Group-only post
- Blocked users
- Muted users
- Deleted posts
- Age or region restrictions
Filtering Layers
- At write time: only fanout to eligible users
- At read time: re-check privacy rules
- At ranking time: remove unsafe or blocked content
Recommended Strategy
Even if fanout happens at write time, still check privacy at read time.
Why?
- Relationship may change
- User may block someone
- Post visibility may be updated
- Content may be deleted
👉 Interview Answer
Privacy filtering must be enforced carefully.
Even if we precompute feed entries, I would still perform read-time privacy checks before returning posts.
This prevents stale feed entries from exposing deleted, blocked, or visibility-restricted content.
9️⃣ Caching Strategy
What to Cache?
- Feed entries
- Post objects
- User metadata
- Relationship graph
- Ranking features
- Hot posts
- Media assets through CDN
Cache Layers
- Local service cache
- Redis / Memcached
- CDN for media
- Edge cache for public content
Cache Challenges
- Feed freshness
- Post deletion
- Privacy changes
- Follow / unfollow changes
- Ranking feature updates
👉 Interview Answer
Caching is essential because feed reads are frequent and latency-sensitive.
I would cache feed entries, post objects, user metadata, relationship data, and ranking features.
However, the system must carefully handle cache invalidation for deleted posts, privacy updates, and relationship changes.
🔟 Write Path: Creating a Post
Flow
- User creates post
- Post service validates request
- Store post as source of truth
- Publish post-created event
- Fanout workers consume event
- Find eligible followers/friends
- Create feed entries
- Update caches asynchronously
Event Pipeline
Post Service
→ Kafka / Queue
→ Fanout Workers
→ Feed Store
→ Cache Update
👉 Interview Answer
I would decouple post creation from feed fanout using an async event pipeline.
The post service stores the post as the source of truth and publishes a post-created event.
Fanout workers then update feed stores asynchronously, which keeps post creation fast and resilient.
1️⃣1️⃣ Trade-offs
Push vs Pull
| Strategy | Pros | Cons |
|---|---|---|
| Push | Fast reads | Write amplification |
| Pull | Lower write cost | Slower reads |
| Hybrid | Balanced | More complexity |
Freshness vs Latency
- Real-time fanout improves freshness
- Async fanout may delay feed updates
- Cache improves latency but may return stale feed
Relevance vs Explainability
- ML ranking improves engagement
- But may be harder to debug and explain
Storage Cost vs Read Performance
- Precomputed feeds require more storage
- But improve feed read latency
Privacy Correctness vs Performance
- Read-time filtering improves correctness
- But adds latency
👉 Interview Answer
The main trade-offs are freshness, relevance, latency, storage cost, and privacy correctness.
Push-based feed improves read latency but increases write cost. Pull-based feed reduces write cost but makes reads heavier.
In practice, hybrid feed generation is usually the best approach.
1️⃣2️⃣ Scaling Patterns
Pattern 1: Hybrid Feed Generation
- Push social posts from normal users
- Pull celebrity/recommended content at read time
Pattern 2: Async Fanout
Use queue-based processing:
post-created event → queue → fanout workers
Pattern 3: Separate Source of Truth and Feed View
- Post store = source of truth
- Feed store = read-optimized view
Pattern 4: Feature Store for Ranking
Store precomputed ranking features:
- User interests
- Author affinity
- Post engagement
- Content quality scores
Pattern 5: Shard Feed Store by User ID
Why?
- Feed reads are user-centric
- Keeps feed entries for one user localized
Pattern 6: Backpressure for High-Fanout Authors
For high-follower authors:
- Do not fanout to everyone immediately
- Pull at read time
- Apply rate limiting to fanout jobs
Pattern 7: Feed Assembly Service
Feed response may include:
- Organic posts
- Recommended posts
- Ads
- Safety filtering
- Ranking output
👉 Interview Answer
To scale News Feed, I would separate the post store from the feed store, use async fanout workers, shard feeds by user ID, cache hot data, and use a feature store for ranking.
For high-fanout authors and recommendation content, I would use pull-based retrieval at read time.
1️⃣3️⃣ Failure Handling
Common Failures
- Fanout worker failure
- Queue backlog
- Feed store unavailable
- Ranking service down
- Cache stale
- Deleted post still visible
- Privacy rule update delay
- Recommendation service unavailable
Strategies
- Retry fanout jobs
- Dead-letter queue
- Rebuild feed from post store
- Serve cached feed during degradation
- Fall back to chronological feed
- Apply read-time privacy filtering
- Skip recommendations if recommendation service fails
👉 Interview Answer
The system should degrade gracefully.
If ranking fails, we can fall back to chronological ordering. If recommendations fail, we can still show social posts. If fanout is delayed, users may see a slightly stale feed, which is usually acceptable.
The post store remains the source of truth, so feed entries can be rebuilt when needed.
1️⃣4️⃣ Consistency Model
Stronger Consistency Needed For
- Post creation durability
- Delete post
- Block / mute
- Privacy settings
- Permission checks
Eventual Consistency Acceptable For
- Feed delivery
- Like/comment counters
- Recommendation results
- Analytics
- Ranking features
- Fanout delay
👉 Interview Answer
News Feed does not usually require strong consistency for delivery.
It is acceptable if a new post appears a few seconds later.
However, delete, block, mute, and privacy changes require stronger correctness, because stale data can create privacy or safety issues.
1️⃣5️⃣ End-to-End Flow
Create Post Flow
User creates post
→ Post Service stores post
→ Publish post-created event
→ Fanout workers create feed entries
→ Cache updated asynchronously
Read Feed Flow
User opens feed
→ Feed Service checks cache
→ Fetch precomputed feed entries
→ Pull recommendations / high-fanout posts
→ Apply privacy filtering
→ Rank candidates
→ Assemble final feed
→ Return response
Key Insight
News Feed is a personalized content delivery system, not just a list of recent posts.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing a News Feed, I think of it as a personalized content delivery pipeline.
The system has two main flows: creating posts and reading feeds.
Posts are stored as the source of truth, while feed entries are denormalized views optimized for low-latency reads.
For feed generation, I would use a hybrid push-pull model. Normal social posts can be pushed into followers’ feeds, while high-follower authors, recommendations, and ads can be pulled at read time.
This balances read latency, write amplification, and storage cost.
I would use an asynchronous event pipeline for fanout, so post creation does not block on updating many feeds.
For reads, the feed service would fetch precomputed entries, pull additional candidates, apply privacy filtering, rank results using personalization signals, and assemble the final feed.
The main trade-offs are freshness, relevance, latency, storage cost, and privacy correctness.
Feed delivery can be eventually consistent, but delete, block, mute, and privacy rules need stronger correctness.
Ultimately, the goal is to deliver a fresh, relevant, and safe feed with low latency at large scale.
⭐ Final Insight
News Feed design is about combining social graph, ranking, caching, and privacy into one low-latency personalized delivery system.
中文部分
🎯 Design News Feed
1️⃣ 核心框架
在设计 News Feed 时,我通常从以下几个方面来分析:
- 核心用户流程:创建 post、关注/好友关系、读取 feed
- 数据模型:用户、post、关系、feed entry
- Feed 生成策略:push vs pull vs hybrid
- 多来源 candidate generation
- 排序、个性化和过滤
- 缓存和扩展模式
- 核心权衡:新鲜度 vs 相关性 vs 延迟
- 故障处理、隐私和一致性
2️⃣ 核心需求
功能需求
- 用户可以创建 post
- 用户可以 follow 或加好友
- 用户可以查看个性化 news feed
- 用户可以点赞、评论、分享、隐藏或举报 post
- 支持图片 / 视频内容
- 支持排序和推荐
- 支持隐私规则
非功能需求
- Feed 读取延迟低
- 高可用
- 高写入吞吐
- 接近实时的新鲜度
- 个性化排序
- 强隐私保护
- Feed 展示可以接受最终一致性
👉 面试回答
News Feed 系统需要支持 post 创建、 用户关系管理以及个性化 feed 读取。
Feed 读取路径对延迟非常敏感, 但 ranking 逻辑可能非常复杂。
核心挑战是在新鲜度、相关性、延迟、 存储成本和隐私保护之间做平衡。
3️⃣ 主要 API
创建 Post
POST /api/posts
Request:
{
"userId": "u123",
"content": "My new post",
"mediaIds": ["m1"],
"visibility": "friends"
}
Response:
{
"postId": "p789",
"createdAt": "2026-05-02T10:00:00Z"
}
Follow / Friend User
POST /api/relationships
Request:
{
"sourceUserId": "u123",
"targetUserId": "u456",
"type": "follow"
}
获取 News Feed
GET /api/feed?userId=u123&cursor=xxx&limit=50
与 Post 互动
POST /api/posts/{postId}/actions
Request:
{
"userId": "u123",
"action": "like"
}
👉 面试回答
我会将 post 创建、关系更新、feed 读取和互动行为拆成不同 API。
Feed 读取是核心路径, 而 likes、comments 和 analytics 通常可以异步处理。
4️⃣ 数据模型
User Table
user (
user_id VARCHAR PRIMARY KEY,
username VARCHAR,
created_at TIMESTAMP
)
Post Table
post (
post_id VARCHAR PRIMARY KEY,
author_id VARCHAR,
content TEXT,
media_ids ARRAY,
visibility VARCHAR,
created_at TIMESTAMP,
status VARCHAR
)
Relationship Table
relationship (
source_user_id VARCHAR,
target_user_id VARCHAR,
type VARCHAR,
created_at TIMESTAMP,
PRIMARY KEY (source_user_id, target_user_id)
)
Reverse Relationship Table
reverse_relationship (
target_user_id VARCHAR,
source_user_id VARCHAR,
type VARCHAR,
created_at TIMESTAMP,
PRIMARY KEY (target_user_id, source_user_id)
)
Feed Entry Table
feed_entry (
user_id VARCHAR,
post_id VARCHAR,
author_id VARCHAR,
created_at TIMESTAMP,
score DOUBLE,
source VARCHAR,
PRIMARY KEY (user_id, created_at, post_id)
)
Engagement Table
post_engagement (
post_id VARCHAR,
user_id VARCHAR,
action VARCHAR,
created_at TIMESTAMP,
PRIMARY KEY (post_id, user_id, action)
)
为什么 Post 和 Feed Entry 要分开?
- Post 表是 source of truth
- Feed entry 表是读优化视图
- Feed 可以包含多个来源的内容
- Feed entry 可以在需要时重建
👉 面试回答
我会将 post 作为 source of truth 存储, 并将 feed entry 作为反规范化的读优化视图单独存储。
这样 feed service 在读取时不需要每次 join post、 relationship、ranking signals 和 engagement data, 从而保证低延迟。
5️⃣ Feed 生成策略
方案 1:Push Model / Fanout-on-Write
当用户创建 post 时:
author creates post
→ find followers/friends
→ insert post into each follower's feed
优点
- Feed 读取非常快
- 适合普通用户
- 读路径简单
缺点
- 写放大严重
- 对粉丝很多的用户很昂贵
- 隐私变化后,旧 feed entry 比较难处理
👉 面试回答
在 push model 中, 系统会在 post 创建时提前生成 feed entries。
这样 feed 读取非常快, 但会产生写放大, 特别是当作者有大量 followers 时。
方案 2:Pull Model / Fanout-on-Read
当用户打开 feed 时:
user opens feed
→ get followed users / friends
→ fetch recent posts
→ merge, filter, rank
优点
- 写入成本低
- 更容易应用最新的隐私和排序规则
- 更适合 celebrity 或高 fanout 账号
缺点
- Feed 读取更慢
- 对关注很多人的用户成本高
- 读取时需要 merge 和 rank
👉 面试回答
在 pull model 中, 系统会在用户请求 feed 时动态生成 feed。
这样避免了大规模写入 fanout, 但是读取路径会更重, 因为需要拉取、合并、过滤并排序大量 post。
方案 3:Hybrid Model
推荐方案:
Normal users → push model
High-follower users / recommended content → pull model
为什么用 Hybrid?
- 大多数用户的受众规模较小
- 少数用户会带来巨大的 fanout 压力
- Feed 可以同时包含社交内容和推荐内容
- Hybrid 可以平衡读取延迟和写放大
👉 面试回答
在生产系统中,我会使用 hybrid model。
普通用户的 post 可以提前 push 到 followers 的 feed 中。 对于 high-follower 用户和推荐内容, 则在用户读取 feed 时再 pull 并合并进去。
这样可以平衡低延迟读取和可控的写入成本。
核心理解
News Feed 设计是在决定: 哪些内容提前预计算,哪些内容实时拉取,哪些内容动态排序。
6️⃣ Candidate Generation
Candidate Sources
现代 News Feed 不只是来自关注用户。
它可能包括:
- 好友 / 关注用户的 posts
- Reposts 或 shared posts
- 用户社交网络中的热门 posts
- Group / community posts
- 推荐内容
- Ads
- Trending topics
Candidate Pipeline
Social Graph Candidates
→ Recommendation Candidates
→ Ads Candidates
→ Filtering
→ Ranking
→ Feed Assembly
为什么 Candidate Generation 重要?
- Ranking 无法排序没有召回到的内容
- Candidate generation 控制召回率
- 不同来源有不同的新鲜度和质量
👉 面试回答
Candidate generation 会从多个来源收集可能展示的 feed items, 包括好友、关注用户、群组、推荐内容和广告。
它的目标是高召回, 后续 ranking 阶段再决定哪些内容应该排在前面。
7️⃣ 排序和个性化
为什么需要 Ranking?
纯时间排序简单, 但不一定能展示最相关的内容。
Ranking 可以优化:
- 相关性
- 用户参与度
- 新鲜度
- 多样性
- 安全性
- 商业目标
Ranking Signals
- 新鲜度
- 关系强度
- 用户兴趣
- 互动概率
- 内容类型偏好
- 负反馈
- 作者质量
- Post 质量
- 内容安全
- 多样性约束
Ranking Pipeline
Candidate Generation
→ Eligibility Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Feed Assembly
Re-ranking 目标
- 避免连续太多同一个作者的 posts
- 避免广告过多
- 混合新鲜内容和高质量内容
- 应用安全和策略规则
- 提高内容多样性
👉 面试回答
Ranking 通常是一个多阶段 pipeline。
系统先生成 candidates, 然后过滤掉不符合条件的内容, 再进行轻量排序和更复杂的 ML 排序, 最后根据多样性、安全性和新鲜度做 re-ranking。
这样可以在相关性和性能之间取得平衡。
8️⃣ 隐私和过滤
为什么隐私重要?
News Feed 必须正确执行 visibility rules。
例如:
- Public post
- Friends-only post
- Group-only post
- Blocked users
- Muted users
- Deleted posts
- 年龄或地区限制
Filtering Layers
- 写入时:只 fanout 给有权限的用户
- 读取时:重新检查隐私规则
- 排序时:移除不安全或 blocked 内容
推荐策略
即使已经在写入时 fanout, 读取时仍然要重新检查 privacy。
原因:
- 关系可能改变
- 用户可能 block 某人
- Post visibility 可能更新
- 内容可能被删除
👉 面试回答
隐私过滤必须非常谨慎。
即使系统提前预计算了 feed entries, 我也会在读取时再次进行 privacy check。
这样可以避免旧的 feed entries 暴露已经删除、 blocked 或 visibility 受限的内容。
9️⃣ 缓存策略
缓存什么?
- Feed entries
- Post objects
- User metadata
- Relationship graph
- Ranking features
- Hot posts
- Media assets through CDN
缓存层
- Local service cache
- Redis / Memcached
- CDN for media
- Edge cache for public content
缓存挑战
- Feed freshness
- Post deletion
- Privacy changes
- Follow / unfollow changes
- Ranking feature updates
👉 面试回答
缓存非常重要, 因为 feed 读取频繁并且对延迟敏感。
我会缓存 feed entries、post objects、 user metadata、relationship data 和 ranking features。
但是系统需要谨慎处理缓存失效, 特别是 post 删除、privacy 更新和关系变化。
🔟 写路径:创建 Post
流程
- 用户创建 post
- Post service 校验请求
- 将 post 存为 source of truth
- 发布 post-created event
- Fanout workers 消费事件
- 找到 eligible followers/friends
- 创建 feed entries
- 异步更新 cache
Event Pipeline
Post Service
→ Kafka / Queue
→ Fanout Workers
→ Feed Store
→ Cache Update
👉 面试回答
我会使用异步事件管道将 post 创建和 feed fanout 解耦。
Post service 会先将 post 作为 source of truth 存储下来, 然后发布 post-created event。
Fanout workers 再异步更新 feed stores, 这样可以让 post 创建保持快速且稳定。
1️⃣1️⃣ 核心权衡
Push vs Pull
| Strategy | 优点 | 缺点 |
|---|---|---|
| Push | 读取快 | 写放大 |
| Pull | 写入成本低 | 读取慢 |
| Hybrid | 折中 | 系统更复杂 |
新鲜度 vs 延迟
- 实时 fanout 提高新鲜度
- 异步 fanout 可能导致 feed 更新延迟
- 缓存降低延迟,但可能返回旧 feed
相关性 vs 可解释性
- ML ranking 提高参与度
- 但更难调试和解释
存储成本 vs 读取性能
- 预计算 feed 需要更多存储
- 但能提升 feed 读取性能
隐私正确性 vs 性能
- 读取时过滤提高正确性
- 但会增加延迟
👉 面试回答
主要权衡包括新鲜度、相关性、延迟、 存储成本和隐私正确性。
Push-based feed 可以提升读取性能, 但会增加写入成本。
Pull-based feed 降低写入成本, 但会让读取路径更重。
实际系统通常使用 hybrid feed generation。
1️⃣2️⃣ 扩展模式
Pattern 1: Hybrid Feed Generation
- 普通用户的社交 post 使用 push
- Celebrity / 推荐内容在读取时 pull
Pattern 2: Async Fanout
使用 queue-based processing:
post-created event → queue → fanout workers
Pattern 3: Separate Source of Truth and Feed View
- Post store = source of truth
- Feed store = read-optimized view
Pattern 4: Feature Store for Ranking
存储预计算 ranking features:
- 用户兴趣
- 作者亲密度
- Post engagement
- 内容质量分数
Pattern 5: Shard Feed Store by User ID
原因:
- Feed 读取以 user 为中心
- 可以让一个用户的 feed entries 尽量聚合在一起
Pattern 6: Backpressure for High-Fanout Authors
对于高粉丝作者:
- 不立即 fanout 给所有人
- 读取时再 pull
- 对 fanout job 限流
Pattern 7: Feed Assembly Service
Feed response 可以包含:
- Organic posts
- Recommended posts
- Ads
- Safety filtering
- Ranking output
👉 面试回答
为了扩展 News Feed, 我会将 post store 和 feed store 分开, 使用异步 fanout workers, 按 user ID 对 feed store 分片, 缓存热点数据, 并使用 feature store 支持 ranking。
对于 high-fanout authors 和 recommendation content, 我会在读取时使用 pull-based retrieval。
1️⃣3️⃣ 故障处理
常见故障
- Fanout worker 失败
- Queue backlog
- Feed store 不可用
- Ranking service down
- Cache stale
- Deleted post still visible
- Privacy rule update delay
- Recommendation service unavailable
处理策略
- 重试 fanout jobs
- Dead-letter queue
- 从 post store 重建 feed
- 降级时返回 cached feed
- 回退到 chronological feed
- 读取时强制 privacy filtering
- Recommendation service 失败时只展示 social posts
👉 面试回答
系统应该优雅降级。
如果 ranking 失败, 可以回退到按时间排序。 如果 recommendation 失败, 仍然可以展示社交内容。 如果 fanout 延迟, 用户可能看到稍微旧一点的 feed, 通常是可以接受的。
Post store 是 source of truth, 因此 feed entries 可以在需要时重建。
1️⃣4️⃣ 一致性模型
需要较强一致性的场景
- Post 创建持久化
- 删除 post
- Block / mute
- Privacy settings
- Permission checks
可以最终一致的场景
- Feed delivery
- Like / comment counters
- Recommendation results
- Analytics
- Ranking features
- Fanout delay
👉 面试回答
News Feed 通常不要求 delivery 强一致。
新 post 晚几秒出现在 feed 里是可以接受的。
但是 delete、block、mute 和 privacy changes 需要更强正确性, 因为旧数据可能造成隐私或安全问题。
1️⃣5️⃣ End-to-End Flow
Create Post Flow
User creates post
→ Post Service stores post
→ Publish post-created event
→ Fanout workers create feed entries
→ Cache updated asynchronously
Read Feed Flow
User opens feed
→ Feed Service checks cache
→ Fetch precomputed feed entries
→ Pull recommendations / high-fanout posts
→ Apply privacy filtering
→ Rank candidates
→ Assemble final feed
→ Return response
Key Insight
News Feed 是一个个性化内容分发系统, 不只是按时间排列的 post list。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 News Feed 时, 我会将它看作一个个性化内容分发 pipeline。
系统有两个主要流程: 创建 post 和读取 feed。
Post 作为 source of truth 存储, feed entries 则是为了低延迟读取而设计的反规范化视图。
对于 feed generation, 我会使用 hybrid push-pull model。 普通社交 post 可以提前 push 到 followers 的 feed 中, 而 high-follower authors、recommendations 和 ads 可以在读取时 pull 进来。
这样可以平衡读取延迟、写放大和存储成本。
我会使用异步事件管道处理 fanout, 避免 post 创建阻塞在大量 feed 更新上。
在读取路径中, feed service 会获取预计算 feed entries, 拉取额外 candidates, 应用 privacy filtering, 使用个性化信号进行 ranking, 最后组装最终 feed。
主要权衡包括新鲜度、相关性、延迟、 存储成本和隐私正确性。
Feed delivery 可以最终一致, 但 delete、block、mute 和 privacy rules 需要更强正确性。
最终目标是在大规模场景下, 低延迟地返回新鲜、相关且安全的 feed。
⭐ Final Insight
News Feed 的核心是把社交图谱、排序、缓存和隐私规则 组合成一个低延迟的个性化内容分发系统。
Implement