Q&A-SystemDesign ·

🎯 Core Decision Framework

When evaluating Database vs Object Storage (S3) for large blob or image storage, I typically assess the system across five dimensions:

Data Size & Storage Cost
Access Pattern & Throughput
Transaction & Consistency Requirements
Scalability & Operational Complexity
Large File Upload Strategy (Chunking)

1️⃣ Data Size & Storage Cost

Database Storage (e.g., PostgreSQL, MySQL, Cassandra)

Strengths:

Easy integration with application data
Simple data management within one system
Transactionally consistent with metadata

Limitations:

Storage cost is significantly higher
Large blobs increase database size rapidly
Backup and replication become expensive

Best fit:

Small files (typically < 1MB)
Low-volume blob storage
Data tightly coupled with transactional records

Storing blobs directly inside a database simplifies application design because metadata and file content live in the same system. However, databases are optimized for structured data rather than large binary objects. Large blobs dramatically increase storage costs and slow down backups and replication. Therefore, storing blobs in databases is generally recommended only for small files or tightly coupled transactional data.

Object Storage (e.g., Amazon S3, Google Cloud Storage)

Strengths:

Extremely low storage cost
Designed for massive blob storage
Built-in durability (often 11 nines)

Trade-offs:

Requires separate metadata storage
Additional network hop during access

Best fit:

Large images
Videos
Documents
User-generated content

Object storage systems like Amazon S3 are specifically designed for storing large amounts of binary data. They provide extremely high durability and very low storage cost compared to databases. Because of this, most large-scale systems store only metadata in the database and place the actual blob data in object storage.

2️⃣ Access Pattern & Throughput

Database Access Pattern

Direct query access
Transaction-aware retrieval
Suitable for low-latency small payloads

Limitations:

Large blobs increase query latency
Heavy disk I/O pressure
Impacts overall database performance

Databases are optimized for structured queries rather than large object delivery. When blobs become large or frequent, they introduce heavy disk I/O and can degrade overall query performance. This makes database blob storage unsuitable for high-throughput media workloads.

Object Storage Access Pattern

HTTP-based access
CDN-friendly
High parallel download capability

Advantages:

Massive horizontal scalability
Direct client access via signed URLs
Works well with CDN caching

Object storage integrates naturally with CDN and HTTP-based delivery. Applications often generate pre-signed URLs allowing clients to download files directly from object storage, bypassing the application server and significantly improving scalability.

3️⃣ Transaction & Consistency Requirements

Database — Strong Consistency

Advantages:

Blob and metadata updated together
Atomic transactions
Easy rollback

Best fit:

Financial documents
Small attachments
Strong consistency requirements

Databases support strong ACID transactions, allowing file data and metadata to be written atomically. This is useful when the file content must be strictly consistent with relational records.

Object Storage — Eventually Consistent Workflow

Typical architecture:

Upload file to object storage
Store file metadata + object key in database

Trade-offs:

Two-step operation
Requires failure handling

Example pattern:

client → upload file → S3
application → save metadata → database

When using object storage, systems usually store only the file reference (object key or URL) in the database. This introduces a two-step workflow and requires careful handling of partial failures, but it scales far better for large systems.

4️⃣ Scalability Strategy

Database Blob Storage

Challenges:

Database size grows rapidly
Backup time increases significantly
Replication overhead grows

Operational risk:

Slower failover
Larger snapshots
Higher infrastructure cost

Object Storage

Advantages:

Virtually unlimited scalability
Managed infrastructure
Independent scaling from database

Common architecture pattern

Client
│
▼
Application Server
│
├── Metadata → Database
│
└── File Upload → Object Storage (S3)

Object storage separates large binary data from transactional databases, allowing both systems to scale independently. This architecture is widely used in large-scale systems such as social media platforms, media services, and cloud storage applications.

5️⃣ Large File Upload Strategy (Chunking / Multipart Upload)

When files become large (e.g., images, videos, archives), uploading them in a single request becomes unreliable and inefficient. Most large-scale systems therefore use chunked uploads or multipart uploads.

Problem with Large File Upload

Uploading a large file in a single request has several issues:

Network interruptions cause full upload failure
Large retry cost (must re-upload entire file)
Memory pressure on application servers
Poor mobile network reliability

Example problem:

Client → Upload 500MB video → Server
Network drop at 90%
→ entire upload fails

This results in extremely poor user experience.

Chunked Upload / Multipart Upload

The common solution is breaking a large file into smaller chunks.

Example workflow:

Client
│
├── upload chunk 1 → S3
├── upload chunk 2 → S3
├── upload chunk 3 → S3
│
└── complete upload request

Each chunk is uploaded independently.

Example Architecture

Client
│
│ request upload session
▼
Application Server
│
│ generate upload id
▼
Object Storage (S3 Multipart Upload)

Client uploads chunks directly to S3

chunk1
chunk2
chunk3
chunk4

After all chunks uploaded

Client → complete upload → S3 assemble file

Advantages

Reliability

Failed chunk can retry independently
Upload resumes from last successful chunk

Performance

Parallel chunk upload
Faster throughput

Scalability

Application server is bypassed
Client uploads directly to object storage

Typical Implementation (S3 Multipart Upload)

Typical flow:

Client requests upload session
Server returns uploadId + presigned URLs
Client uploads chunks directly to S3
Client sends CompleteMultipartUpload

Example API flow:

POST /upload/init
→ return uploadId

PUT /upload/chunk
→ upload chunk

POST /upload/complete
→ finalize file

When to Use Chunked Upload

Use chunked upload when:

File size > 5MB
Mobile networks are common
Video / media platforms
Cloud storage systems

Examples:

YouTube
Dropbox
Google Drive
Instagram

🧠 Senior / Staff-Level Summary Answer

When deciding between database storage and object storage for large blobs, I usually consider file size, access patterns, transaction requirements, and long-term scalability.
Databases work well for small binary data tightly coupled with relational records. However, for large-scale systems storing images, videos, or documents, object storage such as S3 is generally the better choice because it provides significantly lower cost, better scalability, and integration with CDN delivery.
In practice, most production systems adopt a hybrid model: storing metadata in a database while placing the actual blob content in object storage.

⭐ Staff-Level Insight (Bonus)

The real architectural goal is decoupling structured data from large binary objects. Databases should manage metadata and transactional logic, while object storage handles scalable binary storage and delivery.

｜

\/

中文部分

🎯 核心决策框架

在评估 数据库存储 vs 对象存储（S3） 时，我通常从五个维度来考虑：

数据大小与存储成本
访问模式与吞吐量
事务一致性需求
扩展性与运维复杂度
大文件上传策略（分块上传）

1️⃣ 数据大小与存储成本

数据库存储（例如 PostgreSQL、MySQL）

优势

与业务数据集成简单
文件与元数据可一起管理
支持事务一致性

局限

存储成本高
大文件会迅速膨胀数据库体积
备份和复制成本增加

适合场景

小文件（通常 <1MB）
少量附件
与事务数据强耦合

对象存储（例如 Amazon S3）

优势

存储成本低
专门为大规模文件设计
极高耐久性

适合场景

图片
视频
文档
用户上传内容

2️⃣ 访问模式与吞吐量

数据库适合结构化查询，但不适合高吞吐媒体分发。

对象存储：

HTTP访问
CDN友好
支持大规模并发下载

3️⃣ 事务一致性需求

数据库：

ACID事务
元数据与文件原子写入

对象存储：

文件与元数据分开写入
更适合大规模系统

4️⃣ 扩展性策略

数据库：

数据量增长快
备份成本高

对象存储：

几乎无限扩展
与数据库独立扩展

5️⃣ 大文件上传策略（分块上传）

大文件通常使用 Chunked Upload / Multipart Upload。

优势：

支持断点续传
支持并行上传
客户端直接上传到 S3

🧠 Staff 级总结回答

在大规模系统中，通常采用 混合架构：

数据库 → 存储 metadata
S3 → 存储实际文件

这种架构既保证事务能力，又具备极强扩展性。

⭐ Staff 洞察

真正的架构目标是：

解耦结构化数据和二进制数据。 ```