d&d-t System Design Deep Dive ·

🎯 Design News Feed

1️⃣ Core Framework

When discussing News Feed design, I frame it as:

Core user flows: create post, follow/friend, read feed
Data model: users, posts, relationships, feed entries
Feed generation strategy: push vs pull vs hybrid
Candidate generation from multiple sources
Ranking, personalization, and filtering
Caching and scaling patterns
Trade-offs: freshness vs relevance vs latency
Failure handling, privacy, and consistency

2️⃣ Core Requirements

Functional Requirements

User can create posts
User can follow or friend other users
User can view personalized news feed
User can like, comment, share, hide, or report posts
Support media posts
Support ranking and recommendations
Support privacy rules

Non-functional Requirements

Low-latency feed reads
High availability
High write throughput
Near-real-time freshness
Personalized ranking
Strong privacy enforcement
Eventually consistent feed delivery is acceptable

👉 Interview Answer

A news feed system needs to support post creation, relationship management, and personalized feed reading.

The feed read path is latency-sensitive, but the ranking logic can be complex.

The key challenge is balancing freshness, relevance, latency, storage cost, and privacy enforcement.

3️⃣ Main APIs

Create Post

POST /api/posts

Request:

{
  "userId": "u123",
  "content": "My new post",
  "mediaIds": ["m1"],
  "visibility": "friends"
}

Response:

{
  "postId": "p789",
  "createdAt": "2026-05-02T10:00:00Z"
}

Follow / Friend User

POST /api/relationships

Request:

{
  "sourceUserId": "u123",
  "targetUserId": "u456",
  "type": "follow"
}

Get News Feed

GET /api/feed?userId=u123&cursor=xxx&limit=50

Interact With Post

POST /api/posts/{postId}/actions

Request:

{
  "userId": "u123",
  "action": "like"
}

👉 Interview Answer

I would separate post creation, relationship updates, feed reads, and engagement actions into different APIs.

Feed reads are the critical path, while likes, comments, and analytics can often be processed asynchronously.

4️⃣ Data Model

User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Post Table

post (
  post_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  visibility VARCHAR,
  created_at TIMESTAMP,
  status VARCHAR
)

Relationship Table

relationship (
  source_user_id VARCHAR,
  target_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (source_user_id, target_user_id)
)

Reverse Relationship Table

reverse_relationship (
  target_user_id VARCHAR,
  source_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (target_user_id, source_user_id)
)

Feed Entry Table

feed_entry (
  user_id VARCHAR,
  post_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  source VARCHAR,
  PRIMARY KEY (user_id, created_at, post_id)
)

Engagement Table

post_engagement (
  post_id VARCHAR,
  user_id VARCHAR,
  action VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (post_id, user_id, action)
)

Why Separate Post and Feed Entry?

Post table is the source of truth
Feed entry table is a read-optimized view
Feed can contain posts from multiple sources
Feed entries can be rebuilt if needed

👉 Interview Answer

I would store posts as the source of truth and store feed entries separately as a denormalized, read-optimized view.

This allows the feed service to read quickly without joining posts, relationships, ranking signals, and engagement data on every request.

5️⃣ Feed Generation Strategy

Option 1: Push Model / Fanout-on-Write

When a user creates a post:

author creates post
→ find followers/friends
→ insert post into each follower's feed

Pros

Feed reads are very fast
Good for normal users
Simple read path

Cons

High write amplification
Expensive for users with many followers
Privacy changes are harder to apply retroactively

👉 Interview Answer

In the push model, the system precomputes feed entries when a post is created.

This makes feed reads very fast, but it creates write amplification, especially for users with a large number of followers.

Option 2: Pull Model / Fanout-on-Read

When user opens feed:

user opens feed
→ get followed users / friends
→ fetch recent posts
→ merge, filter, rank

Pros

Lower write cost
Easier to apply latest privacy and ranking rules
Better for celebrity or high-fanout accounts

Cons

Feed reads are slower
Expensive for users following many people
Requires merging and ranking at read time

👉 Interview Answer

In the pull model, the system builds the feed when the user requests it.

This avoids large fanout writes, but makes feed reads more expensive because we need to fetch, merge, filter, and rank many posts.

Option 3: Hybrid Model

Recommended approach:

Normal users → push model
High-follower users / recommended content → pull model

Why Hybrid?

Most users have relatively small audiences
A few users create huge fanout pressure
Feed can include both social posts and recommendations
Hybrid balances read latency and write amplification

👉 Interview Answer

In production, I would use a hybrid model.

For normal users, I would push posts into followers’ feeds. For high-follower users and recommendation sources, I would pull content at read time and merge it into the feed.

This balances low-latency reads with manageable write cost.

Core Insight

News Feed design is about deciding what to precompute, what to fetch live, and what to rank dynamically.

6️⃣ Candidate Generation

Candidate Sources

A modern news feed is not only from followees.

It may include:

Posts from friends / followees
Reposts or shared posts
Popular posts in user’s network
Group/community posts
Recommended content
Ads
Trending topics

Candidate Pipeline

Social Graph Candidates
→ Recommendation Candidates
→ Ads Candidates
→ Filtering
→ Ranking
→ Feed Assembly

Why Candidate Generation Matters

Ranking cannot rank what retrieval did not fetch
Candidate generation controls recall
Different sources have different freshness and quality

👉 Interview Answer

Candidate generation collects possible feed items from multiple sources such as friends, followees, groups, recommendations, and ads.

The goal is high recall, while later ranking stages decide which items should be shown first.

7️⃣ Ranking and Personalization

Why Ranking?

Chronological feeds are simple, but they may not show the most relevant content.

Ranking optimizes:

Relevance
Engagement
Freshness
Diversity
Safety
Business goals

Ranking Signals

Recency
Relationship strength
User interests
Engagement probability
Content type preference
Negative feedback
Author quality
Post quality
Content safety
Diversity constraints

Ranking Pipeline

Candidate Generation
→ Eligibility Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Feed Assembly

Re-ranking Goals

Avoid too many posts from same author
Avoid too many ads
Mix fresh and high-quality posts
Apply safety and policy rules
Improve diversity

👉 Interview Answer

Ranking is usually a multi-stage pipeline.

First, the system generates candidates. Then it filters out ineligible content, applies lightweight ranking, runs more expensive ML models, and finally re-ranks results for diversity, safety, and freshness.

This allows the system to balance relevance and performance.

8️⃣ Privacy and Filtering

Why Privacy Matters

News Feed must enforce visibility rules correctly.

Examples:

Public post
Friends-only post
Group-only post
Blocked users
Muted users
Deleted posts
Age or region restrictions

Filtering Layers

At write time: only fanout to eligible users
At read time: re-check privacy rules
At ranking time: remove unsafe or blocked content

Recommended Strategy

Even if fanout happens at write time, still check privacy at read time.

Why?

Relationship may change
User may block someone
Post visibility may be updated
Content may be deleted

👉 Interview Answer

Privacy filtering must be enforced carefully.

Even if we precompute feed entries, I would still perform read-time privacy checks before returning posts.

This prevents stale feed entries from exposing deleted, blocked, or visibility-restricted content.

9️⃣ Caching Strategy

What to Cache?

Feed entries
Post objects
User metadata
Relationship graph
Ranking features
Hot posts
Media assets through CDN

Cache Layers

Local service cache
Redis / Memcached
CDN for media
Edge cache for public content

Cache Challenges

Feed freshness
Post deletion
Privacy changes
Follow / unfollow changes
Ranking feature updates

👉 Interview Answer

Caching is essential because feed reads are frequent and latency-sensitive.

I would cache feed entries, post objects, user metadata, relationship data, and ranking features.

However, the system must carefully handle cache invalidation for deleted posts, privacy updates, and relationship changes.

🔟 Write Path: Creating a Post

Flow

User creates post
Post service validates request
Store post as source of truth
Publish post-created event
Fanout workers consume event
Find eligible followers/friends
Create feed entries
Update caches asynchronously

Event Pipeline

Post Service
→ Kafka / Queue
→ Fanout Workers
→ Feed Store
→ Cache Update

👉 Interview Answer

I would decouple post creation from feed fanout using an async event pipeline.

The post service stores the post as the source of truth and publishes a post-created event.

Fanout workers then update feed stores asynchronously, which keeps post creation fast and resilient.

1️⃣1️⃣ Trade-offs

Push vs Pull

Strategy	Pros	Cons
Push	Fast reads	Write amplification
Pull	Lower write cost	Slower reads
Hybrid	Balanced	More complexity

Freshness vs Latency

Real-time fanout improves freshness
Async fanout may delay feed updates
Cache improves latency but may return stale feed

Relevance vs Explainability

ML ranking improves engagement
But may be harder to debug and explain

Storage Cost vs Read Performance

Precomputed feeds require more storage
But improve feed read latency

Privacy Correctness vs Performance

Read-time filtering improves correctness
But adds latency

👉 Interview Answer

The main trade-offs are freshness, relevance, latency, storage cost, and privacy correctness.

Push-based feed improves read latency but increases write cost. Pull-based feed reduces write cost but makes reads heavier.

In practice, hybrid feed generation is usually the best approach.

1️⃣2️⃣ Scaling Patterns

Pattern 1: Hybrid Feed Generation

Push social posts from normal users
Pull celebrity/recommended content at read time

Pattern 2: Async Fanout

Use queue-based processing:

post-created event → queue → fanout workers

Pattern 3: Separate Source of Truth and Feed View

Post store = source of truth
Feed store = read-optimized view

Pattern 4: Feature Store for Ranking

Store precomputed ranking features:

User interests
Author affinity
Post engagement
Content quality scores

Pattern 5: Shard Feed Store by User ID

Why?

Feed reads are user-centric
Keeps feed entries for one user localized

Pattern 6: Backpressure for High-Fanout Authors

For high-follower authors:

Do not fanout to everyone immediately
Pull at read time
Apply rate limiting to fanout jobs

Pattern 7: Feed Assembly Service

Feed response may include:

Organic posts
Recommended posts
Ads
Safety filtering
Ranking output

👉 Interview Answer

To scale News Feed, I would separate the post store from the feed store, use async fanout workers, shard feeds by user ID, cache hot data, and use a feature store for ranking.

For high-fanout authors and recommendation content, I would use pull-based retrieval at read time.

1️⃣3️⃣ Failure Handling

Common Failures

Fanout worker failure
Queue backlog
Feed store unavailable
Ranking service down
Cache stale
Deleted post still visible
Privacy rule update delay
Recommendation service unavailable

Strategies

Retry fanout jobs
Dead-letter queue
Rebuild feed from post store
Serve cached feed during degradation
Fall back to chronological feed
Apply read-time privacy filtering
Skip recommendations if recommendation service fails

👉 Interview Answer

The system should degrade gracefully.

If ranking fails, we can fall back to chronological ordering. If recommendations fail, we can still show social posts. If fanout is delayed, users may see a slightly stale feed, which is usually acceptable.

The post store remains the source of truth, so feed entries can be rebuilt when needed.

1️⃣4️⃣ Consistency Model

Stronger Consistency Needed For

Post creation durability
Delete post
Block / mute
Privacy settings
Permission checks

Eventual Consistency Acceptable For

Feed delivery
Like/comment counters
Recommendation results
Analytics
Ranking features
Fanout delay

👉 Interview Answer

News Feed does not usually require strong consistency for delivery.

It is acceptable if a new post appears a few seconds later.

However, delete, block, mute, and privacy changes require stronger correctness, because stale data can create privacy or safety issues.

1️⃣5️⃣ End-to-End Flow

Create Post Flow

User creates post
→ Post Service stores post
→ Publish post-created event
→ Fanout workers create feed entries
→ Cache updated asynchronously

Read Feed Flow

User opens feed
→ Feed Service checks cache
→ Fetch precomputed feed entries
→ Pull recommendations / high-fanout posts
→ Apply privacy filtering
→ Rank candidates
→ Assemble final feed
→ Return response

Key Insight

News Feed is a personalized content delivery system, not just a list of recent posts.

🧠 Staff-Level Answer (Final)

👉 Interview Answer (Full Version)

When designing a News Feed, I think of it as a personalized content delivery pipeline.

The system has two main flows: creating posts and reading feeds.

Posts are stored as the source of truth, while feed entries are denormalized views optimized for low-latency reads.

For feed generation, I would use a hybrid push-pull model. Normal social posts can be pushed into followers’ feeds, while high-follower authors, recommendations, and ads can be pulled at read time.

This balances read latency, write amplification, and storage cost.

I would use an asynchronous event pipeline for fanout, so post creation does not block on updating many feeds.

For reads, the feed service would fetch precomputed entries, pull additional candidates, apply privacy filtering, rank results using personalization signals, and assemble the final feed.

The main trade-offs are freshness, relevance, latency, storage cost, and privacy correctness.

Feed delivery can be eventually consistent, but delete, block, mute, and privacy rules need stronger correctness.

Ultimately, the goal is to deliver a fresh, relevant, and safe feed with low latency at large scale.

⭐ Final Insight

News Feed design is about combining social graph, ranking, caching, and privacy into one low-latency personalized delivery system.

中文部分

🎯 Design News Feed

1️⃣ 核心框架

在设计 News Feed 时，我通常从以下几个方面来分析：

核心用户流程：创建 post、关注/好友关系、读取 feed
数据模型：用户、post、关系、feed entry
Feed 生成策略：push vs pull vs hybrid
多来源 candidate generation
排序、个性化和过滤
缓存和扩展模式
核心权衡：新鲜度 vs 相关性 vs 延迟
故障处理、隐私和一致性

2️⃣ 核心需求

功能需求

用户可以创建 post
用户可以 follow 或加好友
用户可以查看个性化 news feed
用户可以点赞、评论、分享、隐藏或举报 post
支持图片 / 视频内容
支持排序和推荐
支持隐私规则

非功能需求

Feed 读取延迟低
高可用
高写入吞吐
接近实时的新鲜度
个性化排序
强隐私保护
Feed 展示可以接受最终一致性

👉 面试回答

News Feed 系统需要支持 post 创建、用户关系管理以及个性化 feed 读取。

Feed 读取路径对延迟非常敏感，但 ranking 逻辑可能非常复杂。

核心挑战是在新鲜度、相关性、延迟、存储成本和隐私保护之间做平衡。

3️⃣ 主要 API

创建 Post

POST /api/posts

Request:

{
  "userId": "u123",
  "content": "My new post",
  "mediaIds": ["m1"],
  "visibility": "friends"
}

Response:

{
  "postId": "p789",
  "createdAt": "2026-05-02T10:00:00Z"
}

Follow / Friend User

POST /api/relationships

Request:

{
  "sourceUserId": "u123",
  "targetUserId": "u456",
  "type": "follow"
}

获取 News Feed

GET /api/feed?userId=u123&cursor=xxx&limit=50

与 Post 互动

POST /api/posts/{postId}/actions

Request:

{
  "userId": "u123",
  "action": "like"
}

👉 面试回答

我会将 post 创建、关系更新、feed 读取和互动行为拆成不同 API。

Feed 读取是核心路径，而 likes、comments 和 analytics 通常可以异步处理。

4️⃣ 数据模型

User Table

user (
  user_id VARCHAR PRIMARY KEY,
  username VARCHAR,
  created_at TIMESTAMP
)

Post Table

post (
  post_id VARCHAR PRIMARY KEY,
  author_id VARCHAR,
  content TEXT,
  media_ids ARRAY,
  visibility VARCHAR,
  created_at TIMESTAMP,
  status VARCHAR
)

Relationship Table

relationship (
  source_user_id VARCHAR,
  target_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (source_user_id, target_user_id)
)

Reverse Relationship Table

reverse_relationship (
  target_user_id VARCHAR,
  source_user_id VARCHAR,
  type VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (target_user_id, source_user_id)
)

Feed Entry Table

feed_entry (
  user_id VARCHAR,
  post_id VARCHAR,
  author_id VARCHAR,
  created_at TIMESTAMP,
  score DOUBLE,
  source VARCHAR,
  PRIMARY KEY (user_id, created_at, post_id)
)

Engagement Table

post_engagement (
  post_id VARCHAR,
  user_id VARCHAR,
  action VARCHAR,
  created_at TIMESTAMP,
  PRIMARY KEY (post_id, user_id, action)
)

为什么 Post 和 Feed Entry 要分开？

Post 表是 source of truth
Feed entry 表是读优化视图
Feed 可以包含多个来源的内容
Feed entry 可以在需要时重建

👉 面试回答

我会将 post 作为 source of truth 存储，并将 feed entry 作为反规范化的读优化视图单独存储。

这样 feed service 在读取时不需要每次 join post、 relationship、ranking signals 和 engagement data，从而保证低延迟。

5️⃣ Feed 生成策略

方案 1：Push Model / Fanout-on-Write

当用户创建 post 时：

author creates post
→ find followers/friends
→ insert post into each follower's feed

优点

Feed 读取非常快
适合普通用户
读路径简单

缺点

写放大严重
对粉丝很多的用户很昂贵
隐私变化后，旧 feed entry 比较难处理

👉 面试回答

在 push model 中，系统会在 post 创建时提前生成 feed entries。

这样 feed 读取非常快，但会产生写放大，特别是当作者有大量 followers 时。

方案 2：Pull Model / Fanout-on-Read

当用户打开 feed 时：

user opens feed
→ get followed users / friends
→ fetch recent posts
→ merge, filter, rank

优点

写入成本低
更容易应用最新的隐私和排序规则
更适合 celebrity 或高 fanout 账号

缺点

Feed 读取更慢
对关注很多人的用户成本高
读取时需要 merge 和 rank

👉 面试回答

在 pull model 中，系统会在用户请求 feed 时动态生成 feed。

这样避免了大规模写入 fanout，但是读取路径会更重，因为需要拉取、合并、过滤并排序大量 post。

方案 3：Hybrid Model

推荐方案：

Normal users → push model
High-follower users / recommended content → pull model

为什么用 Hybrid？

大多数用户的受众规模较小
少数用户会带来巨大的 fanout 压力
Feed 可以同时包含社交内容和推荐内容
Hybrid 可以平衡读取延迟和写放大

👉 面试回答

在生产系统中，我会使用 hybrid model。

普通用户的 post 可以提前 push 到 followers 的 feed 中。对于 high-follower 用户和推荐内容，则在用户读取 feed 时再 pull 并合并进去。

这样可以平衡低延迟读取和可控的写入成本。

核心理解

News Feed 设计是在决定：哪些内容提前预计算，哪些内容实时拉取，哪些内容动态排序。

6️⃣ Candidate Generation

Candidate Sources

现代 News Feed 不只是来自关注用户。

它可能包括：

好友 / 关注用户的 posts
Reposts 或 shared posts
用户社交网络中的热门 posts
Group / community posts
推荐内容
Ads
Trending topics

Candidate Pipeline

Social Graph Candidates
→ Recommendation Candidates
→ Ads Candidates
→ Filtering
→ Ranking
→ Feed Assembly

为什么 Candidate Generation 重要？

Ranking 无法排序没有召回到的内容
Candidate generation 控制召回率
不同来源有不同的新鲜度和质量

👉 面试回答

Candidate generation 会从多个来源收集可能展示的 feed items，包括好友、关注用户、群组、推荐内容和广告。

它的目标是高召回，后续 ranking 阶段再决定哪些内容应该排在前面。

7️⃣ 排序和个性化

为什么需要 Ranking？

纯时间排序简单，但不一定能展示最相关的内容。

Ranking 可以优化：

相关性
用户参与度
新鲜度
多样性
安全性
商业目标

Ranking Signals

新鲜度
关系强度
用户兴趣
互动概率
内容类型偏好
负反馈
作者质量
Post 质量
内容安全
多样性约束

Ranking Pipeline

Candidate Generation
→ Eligibility Filtering
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Feed Assembly

Re-ranking 目标

避免连续太多同一个作者的 posts
避免广告过多
混合新鲜内容和高质量内容
应用安全和策略规则
提高内容多样性

👉 面试回答

Ranking 通常是一个多阶段 pipeline。

系统先生成 candidates，然后过滤掉不符合条件的内容，再进行轻量排序和更复杂的 ML 排序，最后根据多样性、安全性和新鲜度做 re-ranking。

这样可以在相关性和性能之间取得平衡。

8️⃣ 隐私和过滤

为什么隐私重要？

News Feed 必须正确执行 visibility rules。

例如：

Public post
Friends-only post
Group-only post
Blocked users
Muted users
Deleted posts
年龄或地区限制

Filtering Layers

写入时：只 fanout 给有权限的用户
读取时：重新检查隐私规则
排序时：移除不安全或 blocked 内容

9️⃣ 缓存策略

缓存什么？

Feed entries
Post objects
User metadata
Relationship graph
Ranking features
Hot posts
Media assets through CDN

缓存层

Local service cache
Redis / Memcached
CDN for media
Edge cache for public content

缓存挑战

Feed freshness
Post deletion
Privacy changes
Follow / unfollow changes
Ranking feature updates

👉 面试回答

缓存非常重要，因为 feed 读取频繁并且对延迟敏感。

我会缓存 feed entries、post objects、 user metadata、relationship data 和 ranking features。

但是系统需要谨慎处理缓存失效，特别是 post 删除、privacy 更新和关系变化。

🔟 写路径：创建 Post

流程

用户创建 post
Post service 校验请求
将 post 存为 source of truth
发布 post-created event
Fanout workers 消费事件
找到 eligible followers/friends
创建 feed entries
异步更新 cache

Event Pipeline

Post Service
→ Kafka / Queue
→ Fanout Workers
→ Feed Store
→ Cache Update

👉 面试回答

我会使用异步事件管道将 post 创建和 feed fanout 解耦。

Post service 会先将 post 作为 source of truth 存储下来，然后发布 post-created event。

Fanout workers 再异步更新 feed stores，这样可以让 post 创建保持快速且稳定。

1️⃣1️⃣ 核心权衡

Push vs Pull

Strategy	优点	缺点
Push	读取快	写放大
Pull	写入成本低	读取慢
Hybrid	折中	系统更复杂

新鲜度 vs 延迟

实时 fanout 提高新鲜度
异步 fanout 可能导致 feed 更新延迟
缓存降低延迟，但可能返回旧 feed

存储成本 vs 读取性能

预计算 feed 需要更多存储
但能提升 feed 读取性能

隐私正确性 vs 性能

读取时过滤提高正确性
但会增加延迟

👉 面试回答

主要权衡包括新鲜度、相关性、延迟、存储成本和隐私正确性。

Push-based feed 可以提升读取性能，但会增加写入成本。

Pull-based feed 降低写入成本，但会让读取路径更重。

实际系统通常使用 hybrid feed generation。

1️⃣2️⃣ 扩展模式

Pattern 1: Hybrid Feed Generation

普通用户的社交 post 使用 push
Celebrity / 推荐内容在读取时 pull

Pattern 2: Async Fanout

使用 queue-based processing：

post-created event → queue → fanout workers

Pattern 3: Separate Source of Truth and Feed View

Post store = source of truth
Feed store = read-optimized view

Pattern 4: Feature Store for Ranking

存储预计算 ranking features：

用户兴趣
作者亲密度
Post engagement
内容质量分数

Pattern 5: Shard Feed Store by User ID

原因：

Feed 读取以 user 为中心
可以让一个用户的 feed entries 尽量聚合在一起

Pattern 6: Backpressure for High-Fanout Authors

对于高粉丝作者：

不立即 fanout 给所有人
读取时再 pull
对 fanout job 限流

Pattern 7: Feed Assembly Service

Feed response 可以包含：

Organic posts
Recommended posts
Ads
Safety filtering
Ranking output

👉 面试回答

为了扩展 News Feed，我会将 post store 和 feed store 分开，使用异步 fanout workers，按 user ID 对 feed store 分片，缓存热点数据，并使用 feature store 支持 ranking。

对于 high-fanout authors 和 recommendation content，我会在读取时使用 pull-based retrieval。

1️⃣3️⃣ 故障处理

常见故障

Fanout worker 失败
Queue backlog
Feed store 不可用
Ranking service down
Cache stale
Deleted post still visible
Privacy rule update delay
Recommendation service unavailable

处理策略

重试 fanout jobs
Dead-letter queue
从 post store 重建 feed
降级时返回 cached feed
回退到 chronological feed
读取时强制 privacy filtering
Recommendation service 失败时只展示 social posts

👉 面试回答

系统应该优雅降级。

如果 ranking 失败，可以回退到按时间排序。如果 recommendation 失败，仍然可以展示社交内容。如果 fanout 延迟，用户可能看到稍微旧一点的 feed，通常是可以接受的。

Post store 是 source of truth，因此 feed entries 可以在需要时重建。

1️⃣4️⃣ 一致性模型

需要较强一致性的场景

Post 创建持久化
删除 post
Block / mute
Privacy settings
Permission checks

可以最终一致的场景

Feed delivery
Like / comment counters
Recommendation results
Analytics
Ranking features
Fanout delay

👉 面试回答

News Feed 通常不要求 delivery 强一致。

新 post 晚几秒出现在 feed 里是可以接受的。

但是 delete、block、mute 和 privacy changes 需要更强正确性，因为旧数据可能造成隐私或安全问题。

1️⃣5️⃣ End-to-End Flow

Create Post Flow

User creates post
→ Post Service stores post
→ Publish post-created event
→ Fanout workers create feed entries
→ Cache updated asynchronously

Read Feed Flow

User opens feed
→ Feed Service checks cache
→ Fetch precomputed feed entries
→ Pull recommendations / high-fanout posts
→ Apply privacy filtering
→ Rank candidates
→ Assemble final feed
→ Return response

Key Insight

News Feed 是一个个性化内容分发系统，不只是按时间排列的 post list。

🧠 Staff-Level Answer（最终版）

👉 面试回答（完整背诵版）

在设计 News Feed 时，我会将它看作一个个性化内容分发 pipeline。

系统有两个主要流程：创建 post 和读取 feed。

Post 作为 source of truth 存储， feed entries 则是为了低延迟读取而设计的反规范化视图。

对于 feed generation，我会使用 hybrid push-pull model。普通社交 post 可以提前 push 到 followers 的 feed 中，而 high-follower authors、recommendations 和 ads 可以在读取时 pull 进来。

这样可以平衡读取延迟、写放大和存储成本。

我会使用异步事件管道处理 fanout，避免 post 创建阻塞在大量 feed 更新上。

在读取路径中， feed service 会获取预计算 feed entries，拉取额外 candidates，应用 privacy filtering，使用个性化信号进行 ranking，最后组装最终 feed。

主要权衡包括新鲜度、相关性、延迟、存储成本和隐私正确性。

Feed delivery 可以最终一致，但 delete、block、mute 和 privacy rules 需要更强正确性。

最终目标是在大规模场景下，低延迟地返回新鲜、相关且安全的 feed。

⭐ Final Insight

News Feed 的核心是把社交图谱、排序、缓存和隐私规则组合成一个低延迟的个性化内容分发系统。