Database vs Object Storage (S3) for Large Blob Storage

Post by ailswan Mar. 9

中文 ↓

🎯 Core Decision Framework

When evaluating Database vs Object Storage (S3) for large blob or image storage, I typically assess the system across five dimensions:

  1. Data Size & Storage Cost
  2. Access Pattern & Throughput
  3. Transaction & Consistency Requirements
  4. Scalability & Operational Complexity
  5. Large File Upload Strategy (Chunking)

1️⃣ Data Size & Storage Cost

Database Storage (e.g., PostgreSQL, MySQL, Cassandra)

Strengths:

Limitations:

Best fit:

Storing blobs directly inside a database simplifies application design because metadata and file content live in the same system. However, databases are optimized for structured data rather than large binary objects. Large blobs dramatically increase storage costs and slow down backups and replication. Therefore, storing blobs in databases is generally recommended only for small files or tightly coupled transactional data.


Object Storage (e.g., Amazon S3, Google Cloud Storage)

Strengths:

Trade-offs:

Best fit:

Object storage systems like Amazon S3 are specifically designed for storing large amounts of binary data. They provide extremely high durability and very low storage cost compared to databases. Because of this, most large-scale systems store only metadata in the database and place the actual blob data in object storage.


2️⃣ Access Pattern & Throughput

Database Access Pattern

Limitations:

Databases are optimized for structured queries rather than large object delivery. When blobs become large or frequent, they introduce heavy disk I/O and can degrade overall query performance. This makes database blob storage unsuitable for high-throughput media workloads.


Object Storage Access Pattern

Advantages:

Object storage integrates naturally with CDN and HTTP-based delivery. Applications often generate pre-signed URLs allowing clients to download files directly from object storage, bypassing the application server and significantly improving scalability.


3️⃣ Transaction & Consistency Requirements

Database — Strong Consistency

Advantages:

Best fit:

Databases support strong ACID transactions, allowing file data and metadata to be written atomically. This is useful when the file content must be strictly consistent with relational records.


Object Storage — Eventually Consistent Workflow

Typical architecture:

  1. Upload file to object storage
  2. Store file metadata + object key in database

Trade-offs:

Example pattern:


client → upload file → S3
application → save metadata → database

When using object storage, systems usually store only the file reference (object key or URL) in the database. This introduces a two-step workflow and requires careful handling of partial failures, but it scales far better for large systems.


4️⃣ Scalability Strategy

Database Blob Storage

Challenges:

Operational risk:


Object Storage

Advantages:

Common architecture pattern


Client
│
▼
Application Server
│
├── Metadata → Database
│
└── File Upload → Object Storage (S3)

Object storage separates large binary data from transactional databases, allowing both systems to scale independently. This architecture is widely used in large-scale systems such as social media platforms, media services, and cloud storage applications.


5️⃣ Large File Upload Strategy (Chunking / Multipart Upload)

When files become large (e.g., images, videos, archives), uploading them in a single request becomes unreliable and inefficient. Most large-scale systems therefore use chunked uploads or multipart uploads.


Problem with Large File Upload

Uploading a large file in a single request has several issues:

Example problem:


Client → Upload 500MB video → Server
Network drop at 90%
→ entire upload fails

This results in extremely poor user experience.


Chunked Upload / Multipart Upload

The common solution is breaking a large file into smaller chunks.

Example workflow:


Client
│
├── upload chunk 1 → S3
├── upload chunk 2 → S3
├── upload chunk 3 → S3
│
└── complete upload request

Each chunk is uploaded independently.


Example Architecture


Client
│
│ request upload session
▼
Application Server
│
│ generate upload id
▼
Object Storage (S3 Multipart Upload)

Client uploads chunks directly to S3

chunk1
chunk2
chunk3
chunk4

After all chunks uploaded

Client → complete upload → S3 assemble file


Advantages

Reliability

Performance

Scalability


Typical Implementation (S3 Multipart Upload)

Typical flow:

  1. Client requests upload session
  2. Server returns uploadId + presigned URLs
  3. Client uploads chunks directly to S3
  4. Client sends CompleteMultipartUpload

Example API flow:


POST /upload/init
→ return uploadId

PUT /upload/chunk
→ upload chunk

POST /upload/complete
→ finalize file


When to Use Chunked Upload

Use chunked upload when:

Examples:


🧠 Senior / Staff-Level Summary Answer

When deciding between database storage and object storage for large blobs, I usually consider file size, access patterns, transaction requirements, and long-term scalability.
Databases work well for small binary data tightly coupled with relational records. However, for large-scale systems storing images, videos, or documents, object storage such as S3 is generally the better choice because it provides significantly lower cost, better scalability, and integration with CDN delivery.
In practice, most production systems adopt a hybrid model: storing metadata in a database while placing the actual blob content in object storage.


⭐ Staff-Level Insight (Bonus)

The real architectural goal is decoupling structured data from large binary objects. Databases should manage metadata and transactional logic, while object storage handles scalable binary storage and delivery.


\/


中文部分

🎯 核心决策框架

在评估 数据库存储 vs 对象存储(S3) 时,我通常从五个维度来考虑:

  1. 数据大小与存储成本
  2. 访问模式与吞吐量
  3. 事务一致性需求
  4. 扩展性与运维复杂度
  5. 大文件上传策略(分块上传)

1️⃣ 数据大小与存储成本

数据库存储(例如 PostgreSQL、MySQL)

优势

局限

适合场景


对象存储(例如 Amazon S3)

优势

适合场景


2️⃣ 访问模式与吞吐量

数据库适合结构化查询,但不适合高吞吐媒体分发。

对象存储:


3️⃣ 事务一致性需求

数据库:

对象存储:


4️⃣ 扩展性策略

数据库:

对象存储:


5️⃣ 大文件上传策略(分块上传)

大文件通常使用 Chunked Upload / Multipart Upload

优势:


🧠 Staff 级总结回答

在大规模系统中,通常采用 混合架构

数据库 → 存储 metadata
S3 → 存储实际文件

这种架构既保证事务能力,又具备极强扩展性。


⭐ Staff 洞察

真正的架构目标是:

解耦结构化数据和二进制数据。 ```


Implement