🎯 Design Email System
1️⃣ Core Framework
When discussing Email System design, I frame it as:
- Email composition and submission
- Message storage and metadata
- Sending pipeline and SMTP delivery
- Queue, retry, bounce, and failure handling
- Inbox, search, and folder management
- Templates, bulk email, and notification email
- Spam, abuse, reputation, and rate limiting
- Trade-offs: deliverability vs latency vs reliability
2️⃣ Core Requirements
Functional Requirements
- User can compose and send email
- User can receive email
- Support attachments
- Support inbox, sent, drafts, trash, spam
- Support email search
- Support threading / conversations
- Support read / unread / star / labels
- Support delivery status
- Support bounce handling
- Support templates and bulk notifications
Non-functional Requirements
- Reliable delivery
- High availability
- Durable message storage
- Scalable search
- Good deliverability
- Spam and abuse protection
- Retry failed deliveries
- Eventually consistent mailbox updates are acceptable
👉 Interview Answer
An email system stores, sends, receives, indexes, and organizes messages.
The core challenges are reliable delivery, durable message storage, scalable mailbox search, spam prevention, bounce handling, and maintaining sender reputation.
3️⃣ Core Concepts
Email Message
An email contains:
- Sender
- Recipients
- Subject
- Body
- Attachments
- Headers
- Timestamp
- Message ID
Mailbox
A mailbox stores messages for a user.
Common folders:
Inbox
Sent
Drafts
Trash
Spam
Archive
SMTP
SMTP is used to send email between mail servers.
IMAP / POP3
Used by email clients to retrieve emails.
👉 Interview Answer
I would separate message storage from mailbox views.
The message body and attachments are stored durably, while mailbox metadata tracks folders, labels, read state, and thread relationships.
4️⃣ Main APIs
Send Email
POST /api/emails/send
Request:
{
"from": "alice@example.com",
"to": ["bob@example.com"],
"cc": [],
"bcc": [],
"subject": "Hello",
"body": "Hi Bob",
"attachments": ["file_123"]
}
Save Draft
POST /api/emails/drafts
Get Inbox
GET /api/mailbox/inbox?limit=50&cursor=xxx
Get Email
GET /api/emails/{emailId}
Search Email
GET /api/emails/search?q=invoice
Update Mailbox State
POST /api/emails/{emailId}/labels
👉 Interview Answer
I would expose APIs for sending email, saving drafts, reading mailbox folders, fetching individual messages, searching email, and updating labels or read state.
Sending should be asynchronous because SMTP delivery can be slow or fail.
5️⃣ Data Model
Message Table
email_message (
message_id VARCHAR PRIMARY KEY,
sender_id VARCHAR,
from_address VARCHAR,
subject TEXT,
body_location TEXT,
headers JSON,
status VARCHAR,
created_at TIMESTAMP
)
Recipient Table
email_recipient (
message_id VARCHAR,
recipient_address VARCHAR,
recipient_type VARCHAR, -- to, cc, bcc
delivery_status VARCHAR,
provider_response JSON,
updated_at TIMESTAMP,
PRIMARY KEY (message_id, recipient_address)
)
Mailbox Item Table
mailbox_item (
user_id VARCHAR,
mailbox_item_id VARCHAR,
message_id VARCHAR,
folder VARCHAR,
labels ARRAY,
read BOOLEAN,
starred BOOLEAN,
thread_id VARCHAR,
received_at TIMESTAMP,
PRIMARY KEY (user_id, mailbox_item_id)
)
Attachment Table
attachment (
attachment_id VARCHAR PRIMARY KEY,
message_id VARCHAR,
file_name VARCHAR,
content_type VARCHAR,
size_bytes BIGINT,
storage_location TEXT,
checksum VARCHAR
)
Delivery Event Table
email_delivery_event (
event_id VARCHAR PRIMARY KEY,
message_id VARCHAR,
recipient_address VARCHAR,
event_type VARCHAR, -- sent, delivered, bounced, complained, opened, clicked
created_at TIMESTAMP,
metadata JSON
)
👉 Interview Answer
I would store message content, recipients, mailbox metadata, attachments, and delivery events separately.
This allows one physical message to appear in multiple user mailboxes with different read states, labels, and folders.
6️⃣ High-Level Architecture
Client
→ Email API
→ Message Service
→ Attachment Storage
→ Send Queue
→ Mail Delivery Workers
→ SMTP / Email Provider
→ Bounce / Complaint Handler
Incoming Mail Server
→ Inbound Processor
→ Spam Filter
→ Mailbox Service
→ Search Index
Main Components
Email API
- Validates requests
- Saves drafts
- Creates outbound messages
Message Service
- Stores message metadata
- Stores body and attachment references
Send Queue
- Buffers outgoing email
- Supports retry
- Decouples API from delivery
Delivery Workers
- Send email through SMTP or provider API
- Handle retries and provider responses
Inbound Processor
- Receives incoming messages
- Runs spam checks
- Places messages into user mailbox
Search Index
- Indexes subject, body, sender, recipients, attachments metadata
👉 Interview Answer
I would split the system into outbound and inbound pipelines.
Outbound email goes through message storage, send queue, delivery workers, and SMTP or provider APIs.
Inbound email goes through receiving servers, spam filtering, mailbox placement, and search indexing.
7️⃣ Send Email Flow
Basic Flow
User clicks send
→ Email API validates request
→ Store message and attachments
→ Create sent mailbox item
→ Enqueue send job
→ Delivery worker sends via SMTP/provider
→ Update delivery status
→ Record delivery events
Why Async?
SMTP delivery can:
- Be slow
- Fail temporarily
- Need retries
- Return delayed bounce
👉 Interview Answer
Sending email should be asynchronous.
Once the message is durably stored and queued, the API can return success to the user.
Delivery workers then send the email, retry failures, and update delivery status.
8️⃣ SMTP Delivery and Retry
Delivery Flow
Delivery worker
→ Resolve recipient domain MX record
→ Connect to recipient mail server
→ Send message via SMTP
→ Receive response
→ Mark sent / retry / bounced
Failure Types
Temporary Failure
Examples:
Mailbox temporarily unavailable
Server busy
Rate limited
Network timeout
Action:
retry with backoff
Permanent Failure
Examples:
Invalid recipient
Domain does not exist
Mailbox does not exist
Action:
mark bounced
👉 Interview Answer
SMTP delivery can fail temporarily or permanently.
Temporary failures should be retried with exponential backoff.
Permanent failures should mark the recipient as bounced and stop retrying.
9️⃣ Bounce and Complaint Handling
Bounce Types
- Hard bounce
- Soft bounce
- Mailbox full
- Invalid address
- Domain failure
Complaint
A recipient marks email as spam.
Handling Flow
Provider sends bounce/complaint event
→ Verify event
→ Store delivery event
→ Update recipient delivery status
→ Suppress future sends if needed
→ Update sender reputation
👉 Interview Answer
Bounce and complaint handling are critical for deliverability.
Hard bounces and spam complaints should update suppression lists so the system avoids repeatedly sending to bad or risky addresses.
🔟 Inbound Email Flow
Flow
External sender sends email
→ Our MX server receives message
→ Validate domain and recipient
→ Run spam / virus scanning
→ Store message
→ Create inbox mailbox item
→ Index message for search
→ Notify user
Important Checks
- Recipient exists
- Domain is valid
- Message size limit
- Attachment safety
- Spam score
- Virus scan
- DKIM / SPF / DMARC validation
👉 Interview Answer
For inbound email, the system receives messages through MX servers, validates recipient, runs spam and virus checks, stores the message, creates mailbox entries, indexes it, and notifies the user.
1️⃣1️⃣ Mailbox and Folder Management
Mailbox Operations
- Mark read / unread
- Move to trash
- Archive
- Apply label
- Star
- Delete
- Restore
- Mark spam
Design
Mailbox state is per user.
Example:
same message_id
→ Alice: folder=sent
→ Bob: folder=inbox, unread=true
👉 Interview Answer
Mailbox state should be separate from message content.
The same email message may appear in multiple mailboxes, but each user has their own folder, labels, read state, and starred state.
1️⃣2️⃣ Threading / Conversation View
Threading Signals
- Message-ID header
- In-Reply-To header
- References header
- Subject normalization
- Participants
Thread Table
email_thread (
thread_id VARCHAR PRIMARY KEY,
normalized_subject VARCHAR,
latest_message_at TIMESTAMP,
participant_addresses ARRAY
)
Why Threading Matters
- Better user experience
- Group related replies
- Reduce inbox clutter
👉 Interview Answer
Threading groups related emails into conversations.
I would use email headers such as Message-ID, In-Reply-To, and References, with subject normalization as fallback.
1️⃣3️⃣ Search System
Search Fields
- Subject
- Body
- Sender
- Recipients
- Attachments metadata
- Date
- Labels
- Folder
Search Architecture
Message stored
→ Indexing event emitted
→ Search indexer parses content
→ Search engine indexes fields
→ User queries search service
Search Store
Use:
Elasticsearch / OpenSearch / custom inverted index
👉 Interview Answer
Email search should use an inverted index.
The message storage system emits indexing events, and a search indexer indexes subject, body, sender, recipients, date, labels, and attachment metadata.
1️⃣4️⃣ Attachments
Storage
Attachments should be stored in object storage.
attachment_id → object storage path
Flow
Upload attachment
→ Virus scan
→ Store in object storage
→ Attach reference to email
Requirements
- Size limits
- Virus scanning
- Content-type validation
- Deduplication by checksum
- Access control
- Download URL expiration
👉 Interview Answer
Attachments should not be stored directly in the message database.
I would store them in object storage, scan for viruses, enforce size limits, and reference them from message metadata.
1️⃣5️⃣ Templates and Notification Emails
Use Cases
- Password reset
- Order confirmation
- Receipt
- Marketing email
- System alert
Template Data
{
"templateId": "order_receipt",
"variables": {
"userName": "Alice",
"orderId": "o123"
}
}
Flow
Service requests template email
→ Template service renders content
→ Personalization applied
→ Email queued
→ Delivery pipeline sends email
👉 Interview Answer
For transactional and notification emails, I would use a template service.
Business services send template ID and variables, and the email system renders the message, queues it, and sends it through the normal delivery pipeline.
1️⃣6️⃣ Bulk Email and Rate Limiting
Bulk Email Challenges
- High volume
- Recipient personalization
- Unsubscribe handling
- Rate limits
- Reputation risk
- Spam complaints
Strategies
- Batch sending
- Per-domain throttling
- Suppression lists
- Unsubscribe enforcement
- Campaign pacing
- Complaint monitoring
👉 Interview Answer
Bulk email must be carefully controlled.
I would use batching, domain-based throttling, suppression lists, unsubscribe enforcement, and campaign pacing to protect sender reputation.
1️⃣7️⃣ Spam, Abuse, and Reputation
Spam Signals
- Sender reputation
- Domain reputation
- IP reputation
- Content patterns
- Link reputation
- Attachment risk
- Complaint rate
- Bounce rate
Abuse Prevention
- Rate limit senders
- Verify domains
- Enforce SPF / DKIM / DMARC
- Detect compromised accounts
- Block spam campaigns
- Maintain suppression lists
👉 Interview Answer
Email systems must protect against spam and abuse.
I would use rate limits, sender reputation, domain verification, SPF/DKIM/DMARC, complaint monitoring, and spam scoring to protect deliverability.
1️⃣8️⃣ Tracking
Tracking Events
- Delivered
- Opened
- Clicked
- Bounced
- Complained
- Unsubscribed
How Tracking Works
Open Tracking
Tiny tracking pixel.
Click Tracking
Redirect through tracking URL.
Privacy Concern
Tracking should respect user privacy and consent.
👉 Interview Answer
Email tracking can record delivery, opens, clicks, bounces, and complaints.
However, open and click tracking have privacy implications, so tracking should respect consent, user settings, and legal requirements.
1️⃣9️⃣ Consistency Model
Stronger Consistency Needed For
- Message storage before send
- Suppression list enforcement
- Unsubscribe status
- Access control
- Delete / retention requests
- Sender authentication
Eventual Consistency Acceptable For
- Search indexing
- Delivery status updates
- Open / click tracking
- Analytics
- Mailbox unread count
- Notifications
👉 Interview Answer
Email systems use mixed consistency.
Message storage, access control, unsubscribe enforcement, and suppression lists require stronger correctness.
Search indexing, analytics, delivery tracking, and unread counts can be eventually consistent.
2️⃣0️⃣ Scaling Patterns
Pattern 1: Queue-based Sending
Decouple user request from SMTP delivery.
Pattern 2: Separate Message Store and Mailbox Metadata
Avoid duplicating message content.
Pattern 3: Object Storage for Large Bodies / Attachments
Reduce database pressure.
Pattern 4: Search Index for Query
Do not scan mailbox tables for full-text search.
Pattern 5: Domain-based Delivery Throttling
Protect sender reputation.
👉 Interview Answer
To scale an email system, I would use queue-based sending, separate message content from mailbox metadata, store attachments in object storage, index messages for search, and throttle delivery by domain and sender reputation.
2️⃣1️⃣ Failure Handling
Common Failures
- SMTP timeout
- Recipient server unavailable
- Provider rate limit
- Attachment upload failure
- Search indexing delay
- Duplicate send request
- Bounce event delayed
- Spam complaint received later
Strategies
- Idempotency keys
- Durable send queue
- Retry with backoff
- Dead-letter queue
- Suppression lists
- Bounce processing
- Outbox pattern
- Reconciliation jobs
👉 Interview Answer
Email delivery should assume partial failures.
Messages should be durably stored before sending.
Delivery jobs should retry temporary failures, stop on permanent failures, and record delivery events for auditing and reconciliation.
2️⃣2️⃣ Observability
Key Metrics
- Send request QPS
- Send queue depth
- Delivery success rate
- Bounce rate
- Complaint rate
- Open / click rate
- SMTP latency
- Provider error rate
- Spam classification rate
- Search indexing lag
- Inbox write latency
- Attachment scan failures
👉 Interview Answer
I would monitor send queue depth, delivery success rate, bounce rate, complaint rate, SMTP latency, provider errors, spam classification, search indexing lag, and attachment scan failures.
These metrics directly affect reliability and deliverability.
2️⃣3️⃣ End-to-End Flow
Outbound Send Flow
User sends email
→ Validate request
→ Store message and attachments
→ Create sent mailbox item
→ Enqueue send job
→ Delivery worker sends via SMTP/provider
→ Update recipient delivery status
→ Record delivery event
Inbound Receive Flow
External sender sends email
→ MX server receives message
→ Validate recipient
→ Spam and virus scan
→ Store message
→ Create inbox mailbox item
→ Index message
→ Notify user
Bounce Flow
Recipient server rejects email
→ Bounce event received
→ Update recipient status
→ Store delivery event
→ Add to suppression list if hard bounce
→ Update sender reputation
Key Insight
Email System is not just message sending — it is a durable messaging, delivery, search, spam-control, and reputation system.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing an email system, I think of it as a durable messaging and delivery platform.
The system must support composing, sending, receiving, storing, searching, and organizing email messages.
I would separate message content from mailbox metadata. Message content and attachments are stored durably, while mailbox items store user-specific state such as folder, labels, read status, and thread ID.
For outbound email, the API stores the message first, then enqueues a send job. Delivery workers send the message through SMTP or an email provider, retry temporary failures, and mark permanent failures as bounces.
For inbound email, MX servers receive messages, validate recipients, run spam and virus checks, store messages, create inbox entries, index messages, and notify users.
Attachments should be stored in object storage, scanned for viruses, and referenced from message metadata.
Search should use an inverted index, because full-text search over mailbox tables does not scale.
Deliverability is a major concern. I would use rate limiting, suppression lists, bounce handling, complaint handling, sender reputation, and SPF/DKIM/DMARC validation.
Email systems require mixed consistency. Message storage, access control, unsubscribe enforcement, and suppression lists need stronger correctness. Search, analytics, delivery status, and unread counts can be eventually consistent.
The main trade-offs are delivery latency, reliability, storage cost, search freshness, spam prevention, and sender reputation.
Ultimately, the goal is to reliably store and deliver messages, protect users from spam and abuse, and provide fast mailbox access and search.
⭐ Final Insight
Email System 的核心不是简单发送邮件, 而是一个结合 durable message storage、SMTP delivery、mailbox indexing、spam control 和 sender reputation 的大规模消息系统。
中文部分
🎯 Design Email System
1️⃣ 核心框架
在设计 Email System 时,我通常从以下几个方面分析:
- Email composition and submission
- Message storage and metadata
- Sending pipeline and SMTP delivery
- Queue、retry、bounce 和 failure handling
- Inbox、search 和 folder management
- Templates、bulk email 和 notification email
- Spam、abuse、reputation 和 rate limiting
- 核心权衡:deliverability vs latency vs reliability
2️⃣ 核心需求
功能需求
- 用户可以写邮件并发送
- 用户可以接收邮件
- 支持 attachments
- 支持 inbox、sent、drafts、trash、spam
- 支持 email search
- 支持 threading / conversations
- 支持 read / unread / star / labels
- 支持 delivery status
- 支持 bounce handling
- 支持 templates 和 bulk notifications
非功能需求
- 可靠 delivery
- 高可用
- Durable message storage
- 可扩展 search
- 良好 deliverability
- Spam 和 abuse protection
- Retry failed deliveries
- Mailbox updates 可以最终一致
👉 面试回答
Email System 负责存储、发送、接收、索引和组织邮件。
核心挑战包括可靠投递、持久化消息存储、 可扩展 mailbox search、spam 防护、 bounce handling 和 sender reputation 维护。
3️⃣ 核心概念
Email Message
一封 email 包含:
- Sender
- Recipients
- Subject
- Body
- Attachments
- Headers
- Timestamp
- Message ID
Mailbox
Mailbox 为用户存储邮件。
常见 folders:
Inbox
Sent
Drafts
Trash
Spam
Archive
SMTP
SMTP 用于 mail servers 之间发送邮件。
IMAP / POP3
用于 email clients 获取邮件。
👉 面试回答
我会将 message storage 和 mailbox views 分开。
Message body 和 attachments 持久化存储; mailbox metadata 负责 folders、labels、 read state 和 thread relationships。
4️⃣ 主要 API
Send Email
POST /api/emails/send
Request:
{
"from": "alice@example.com",
"to": ["bob@example.com"],
"cc": [],
"bcc": [],
"subject": "Hello",
"body": "Hi Bob",
"attachments": ["file_123"]
}
Save Draft
POST /api/emails/drafts
Get Inbox
GET /api/mailbox/inbox?limit=50&cursor=xxx
Get Email
GET /api/emails/{emailId}
Search Email
GET /api/emails/search?q=invoice
Update Mailbox State
POST /api/emails/{emailId}/labels
👉 面试回答
我会提供 send email、save drafts、 read mailbox folders、fetch individual messages、 search email 和 update labels/read state 的 APIs。
Sending 应该是异步的, 因为 SMTP delivery 可能很慢或失败。
5️⃣ 数据模型
Message Table
email_message (
message_id VARCHAR PRIMARY KEY,
sender_id VARCHAR,
from_address VARCHAR,
subject TEXT,
body_location TEXT,
headers JSON,
status VARCHAR,
created_at TIMESTAMP
)
Recipient Table
email_recipient (
message_id VARCHAR,
recipient_address VARCHAR,
recipient_type VARCHAR, -- to, cc, bcc
delivery_status VARCHAR,
provider_response JSON,
updated_at TIMESTAMP,
PRIMARY KEY (message_id, recipient_address)
)
Mailbox Item Table
mailbox_item (
user_id VARCHAR,
mailbox_item_id VARCHAR,
message_id VARCHAR,
folder VARCHAR,
labels ARRAY,
read BOOLEAN,
starred BOOLEAN,
thread_id VARCHAR,
received_at TIMESTAMP,
PRIMARY KEY (user_id, mailbox_item_id)
)
Attachment Table
attachment (
attachment_id VARCHAR PRIMARY KEY,
message_id VARCHAR,
file_name VARCHAR,
content_type VARCHAR,
size_bytes BIGINT,
storage_location TEXT,
checksum VARCHAR
)
Delivery Event Table
email_delivery_event (
event_id VARCHAR PRIMARY KEY,
message_id VARCHAR,
recipient_address VARCHAR,
event_type VARCHAR, -- sent, delivered, bounced, complained, opened, clicked
created_at TIMESTAMP,
metadata JSON
)
👉 面试回答
我会将 message content、recipients、 mailbox metadata、attachments 和 delivery events 分开存储。
这样一个 physical message 可以出现在多个用户 mailbox 中, 但每个用户有自己的 read state、labels 和 folders。
6️⃣ High-Level Architecture
Client
→ Email API
→ Message Service
→ Attachment Storage
→ Send Queue
→ Mail Delivery Workers
→ SMTP / Email Provider
→ Bounce / Complaint Handler
Incoming Mail Server
→ Inbound Processor
→ Spam Filter
→ Mailbox Service
→ Search Index
Main Components
Email API
- Validate requests
- Save drafts
- Create outbound messages
Message Service
- Store message metadata
- Store body and attachment references
Send Queue
- Buffer outgoing email
- Support retry
- Decouple API from delivery
Delivery Workers
- Send email through SMTP or provider API
- Handle retries and provider responses
Inbound Processor
- Receive incoming messages
- Run spam checks
- Place messages into user mailbox
Search Index
- Index subject、body、sender、recipients、attachments metadata
👉 面试回答
我会将系统拆成 outbound 和 inbound pipelines。
Outbound email 经过 message storage、send queue、 delivery workers 和 SMTP / provider APIs。
Inbound email 经过 receiving servers、spam filtering、 mailbox placement 和 search indexing。
7️⃣ Send Email Flow
Basic Flow
User clicks send
→ Email API validates request
→ Store message and attachments
→ Create sent mailbox item
→ Enqueue send job
→ Delivery worker sends via SMTP/provider
→ Update delivery status
→ Record delivery events
Why Async?
SMTP delivery 可能:
- Slow
- Temporarily fail
- Need retries
- Return delayed bounce
👉 面试回答
Sending email 应该异步执行。
一旦 message 被持久化保存并进入 queue, API 就可以给用户返回成功。
Delivery workers 之后负责发送 email、 retry failures 和更新 delivery status。
8️⃣ SMTP Delivery and Retry
Delivery Flow
Delivery worker
→ Resolve recipient domain MX record
→ Connect to recipient mail server
→ Send message via SMTP
→ Receive response
→ Mark sent / retry / bounced
Failure Types
Temporary Failure
示例:
Mailbox temporarily unavailable
Server busy
Rate limited
Network timeout
处理方式:
retry with backoff
Permanent Failure
示例:
Invalid recipient
Domain does not exist
Mailbox does not exist
处理方式:
mark bounced
👉 面试回答
SMTP delivery 可能 temporary fail 或 permanent fail。
Temporary failures 应该用 exponential backoff 重试。
Permanent failures 应该将 recipient 标记为 bounced, 并停止 retry。
9️⃣ Bounce and Complaint Handling
Bounce Types
- Hard bounce
- Soft bounce
- Mailbox full
- Invalid address
- Domain failure
Complaint
Recipient 将 email 标记为 spam。
Handling Flow
Provider sends bounce/complaint event
→ Verify event
→ Store delivery event
→ Update recipient delivery status
→ Suppress future sends if needed
→ Update sender reputation
👉 面试回答
Bounce 和 complaint handling 对 deliverability 很关键。
Hard bounces 和 spam complaints 应该更新 suppression lists, 避免系统反复向无效或高风险地址发送邮件。
🔟 Inbound Email Flow
Flow
External sender sends email
→ Our MX server receives message
→ Validate domain and recipient
→ Run spam / virus scanning
→ Store message
→ Create inbox mailbox item
→ Index message for search
→ Notify user
Important Checks
- Recipient exists
- Domain is valid
- Message size limit
- Attachment safety
- Spam score
- Virus scan
- DKIM / SPF / DMARC validation
👉 面试回答
对 inbound email, 系统通过 MX servers 接收 messages, 校验 recipient, 进行 spam 和 virus checks, 存储 message, 创建 mailbox entries, 建立 search index, 并通知用户。
1️⃣1️⃣ Mailbox and Folder Management
Mailbox Operations
- Mark read / unread
- Move to trash
- Archive
- Apply label
- Star
- Delete
- Restore
- Mark spam
Design
Mailbox state 是 per user 的。
示例:
same message_id
→ Alice: folder=sent
→ Bob: folder=inbox, unread=true
👉 面试回答
Mailbox state 应该和 message content 分开。
同一封 email message 可以出现在多个 mailboxes, 但每个用户有自己的 folder、labels、 read state 和 starred state。
1️⃣2️⃣ Threading / Conversation View
Threading Signals
- Message-ID header
- In-Reply-To header
- References header
- Subject normalization
- Participants
Thread Table
email_thread (
thread_id VARCHAR PRIMARY KEY,
normalized_subject VARCHAR,
latest_message_at TIMESTAMP,
participant_addresses ARRAY
)
Why Threading Matters
- Better user experience
- Group related replies
- Reduce inbox clutter
👉 面试回答
Threading 会将相关 emails 组合成 conversations。
我会使用 Message-ID、In-Reply-To 和 References 这些 email headers, 并用 subject normalization 作为 fallback。
1️⃣3️⃣ Search System
Search Fields
- Subject
- Body
- Sender
- Recipients
- Attachments metadata
- Date
- Labels
- Folder
Search Architecture
Message stored
→ Indexing event emitted
→ Search indexer parses content
→ Search engine indexes fields
→ User queries search service
Search Store
使用:
Elasticsearch / OpenSearch / custom inverted index
👉 面试回答
Email search 应该使用 inverted index。
Message storage system 发布 indexing events, search indexer 会索引 subject、body、sender、 recipients、date、labels 和 attachment metadata。
1️⃣4️⃣ Attachments
Storage
Attachments 应该存储在 object storage。
attachment_id → object storage path
Flow
Upload attachment
→ Virus scan
→ Store in object storage
→ Attach reference to email
Requirements
- Size limits
- Virus scanning
- Content-type validation
- Deduplication by checksum
- Access control
- Download URL expiration
👉 面试回答
Attachments 不应该直接存储在 message database。
我会将它们存入 object storage, 进行 virus scanning, 强制 size limits, 并在 message metadata 中引用它们。
1️⃣5️⃣ Templates and Notification Emails
Use Cases
- Password reset
- Order confirmation
- Receipt
- Marketing email
- System alert
Template Data
{
"templateId": "order_receipt",
"variables": {
"userName": "Alice",
"orderId": "o123"
}
}
Flow
Service requests template email
→ Template service renders content
→ Personalization applied
→ Email queued
→ Delivery pipeline sends email
👉 面试回答
对 transactional 和 notification emails, 我会使用 template service。
Business services 发送 template ID 和 variables, email system 渲染 message, 将其放入 queue, 并通过正常 delivery pipeline 发送。
1️⃣6️⃣ Bulk Email and Rate Limiting
Bulk Email Challenges
- High volume
- Recipient personalization
- Unsubscribe handling
- Rate limits
- Reputation risk
- Spam complaints
Strategies
- Batch sending
- Per-domain throttling
- Suppression lists
- Unsubscribe enforcement
- Campaign pacing
- Complaint monitoring
👉 面试回答
Bulk email 必须谨慎控制。
我会使用 batching、domain-based throttling、 suppression lists、unsubscribe enforcement 和 campaign pacing 来保护 sender reputation。
1️⃣7️⃣ Spam, Abuse, and Reputation
Spam Signals
- Sender reputation
- Domain reputation
- IP reputation
- Content patterns
- Link reputation
- Attachment risk
- Complaint rate
- Bounce rate
Abuse Prevention
- Rate limit senders
- Verify domains
- Enforce SPF / DKIM / DMARC
- Detect compromised accounts
- Block spam campaigns
- Maintain suppression lists
👉 面试回答
Email system 必须防止 spam 和 abuse。
我会使用 rate limits、sender reputation、 domain verification、SPF/DKIM/DMARC、 complaint monitoring 和 spam scoring 来保护 deliverability。
1️⃣8️⃣ Tracking
Tracking Events
- Delivered
- Opened
- Clicked
- Bounced
- Complained
- Unsubscribed
How Tracking Works
Open Tracking
Tiny tracking pixel.
Click Tracking
Redirect through tracking URL.
Privacy Concern
Tracking 应该尊重 user privacy 和 consent。
👉 面试回答
Email tracking 可以记录 delivery、opens、clicks、 bounces 和 complaints。
但是 open 和 click tracking 有 privacy implications, 所以 tracking 应该遵守 consent、user settings 和法律要求。
1️⃣9️⃣ Consistency Model
需要较强一致性的场景
- Message storage before send
- Suppression list enforcement
- Unsubscribe status
- Access control
- Delete / retention requests
- Sender authentication
可以最终一致的场景
- Search indexing
- Delivery status updates
- Open / click tracking
- Analytics
- Mailbox unread count
- Notifications
👉 面试回答
Email systems 使用 mixed consistency。
Message storage、access control、 unsubscribe enforcement 和 suppression lists 需要更强正确性。
Search indexing、analytics、delivery tracking 和 unread counts 可以最终一致。
2️⃣0️⃣ Scaling Patterns
Pattern 1: Queue-based Sending
将用户请求和 SMTP delivery 解耦。
Pattern 2: Separate Message Store and Mailbox Metadata
避免重复存储 message content。
Pattern 3: Object Storage for Large Bodies / Attachments
降低 database 压力。
Pattern 4: Search Index for Query
不要扫描 mailbox tables 做 full-text search。
Pattern 5: Domain-based Delivery Throttling
保护 sender reputation。
👉 面试回答
为了扩展 email system, 我会使用 queue-based sending, 将 message content 和 mailbox metadata 分开, 把 attachments 存入 object storage, 用 search index 支持搜索, 并按 domain 和 sender reputation 控制投递速度。
2️⃣1️⃣ Failure Handling
Common Failures
- SMTP timeout
- Recipient server unavailable
- Provider rate limit
- Attachment upload failure
- Search indexing delay
- Duplicate send request
- Bounce event delayed
- Spam complaint received later
Strategies
- Idempotency keys
- Durable send queue
- Retry with backoff
- Dead-letter queue
- Suppression lists
- Bounce processing
- Outbox pattern
- Reconciliation jobs
👉 面试回答
Email delivery 必须假设 partial failures。
Messages 应该在发送前先持久化存储。
Delivery jobs 对 temporary failures 进行 retry, 对 permanent failures 停止 retry, 并记录 delivery events 以支持 audit 和 reconciliation。
2️⃣2️⃣ Observability
Key Metrics
- Send request QPS
- Send queue depth
- Delivery success rate
- Bounce rate
- Complaint rate
- Open / click rate
- SMTP latency
- Provider error rate
- Spam classification rate
- Search indexing lag
- Inbox write latency
- Attachment scan failures
👉 面试回答
我会监控 send queue depth、delivery success rate、 bounce rate、complaint rate、SMTP latency、 provider errors、spam classification、 search indexing lag 和 attachment scan failures。
这些指标直接影响 reliability 和 deliverability。
2️⃣3️⃣ End-to-End Flow
Outbound Send Flow
User sends email
→ Validate request
→ Store message and attachments
→ Create sent mailbox item
→ Enqueue send job
→ Delivery worker sends via SMTP/provider
→ Update recipient delivery status
→ Record delivery event
Inbound Receive Flow
External sender sends email
→ MX server receives message
→ Validate recipient
→ Spam and virus scan
→ Store message
→ Create inbox mailbox item
→ Index message
→ Notify user
Bounce Flow
Recipient server rejects email
→ Bounce event received
→ Update recipient status
→ Store delivery event
→ Add to suppression list if hard bounce
→ Update sender reputation
Key Insight
Email System 不是简单发送 message, 而是 durable messaging、delivery、search、spam-control 和 reputation system。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 Email System 时, 我会把它看作一个 durable messaging 和 delivery platform。
系统需要支持 compose、send、receive、store、search 和 organize email messages。
我会将 message content 和 mailbox metadata 分开。 Message content 和 attachments 会被持久化存储, mailbox items 则存储用户相关状态, 例如 folder、labels、read status 和 thread ID。
对 outbound email, API 会先存储 message, 然后将 send job 放入 queue。 Delivery workers 会通过 SMTP 或 email provider 发送, 对 temporary failures 重试, 并将 permanent failures 标记为 bounce。
对 inbound email, MX servers 接收 messages, 校验 recipients, 做 spam 和 virus checks, 存储 messages, 创建 inbox entries, 建立 search index, 并通知用户。
Attachments 应该存储在 object storage 中, 经过 virus scan, 并通过 message metadata 引用。
Search 应该使用 inverted index, 因为直接在 mailbox tables 上做 full-text search 不可扩展。
Deliverability 是核心问题。 我会使用 rate limiting、suppression lists、 bounce handling、complaint handling、 sender reputation 和 SPF/DKIM/DMARC validation。
Email systems 需要 mixed consistency。 Message storage、access control、 unsubscribe enforcement 和 suppression lists 需要更强正确性。 Search、analytics、delivery status 和 unread counts 可以最终一致。
核心权衡包括 delivery latency、reliability、 storage cost、search freshness、spam prevention 和 sender reputation。
最终目标是可靠地存储和投递 messages, 保护用户免受 spam 和 abuse, 并提供快速的 mailbox access 和 search。
⭐ Final Insight
Email System 的核心不是简单发送邮件, 而是一个结合 durable message storage、SMTP delivery、mailbox indexing、spam control 和 sender reputation 的大规模消息系统。
Implement