1. Requirements
Functional Requirements
- Users can:
- Create paste (text content)
- Get unique URL for paste
- Retrieve paste via URL
- Optional:
- Expiration time (TTL)
- Public/private pastes
- Edit/delete pastes
Non-Functional Requirements
- Read-heavy system
- Latency: < 100ms for retrieval
- Scalability: Billions of pastes
- Availability: High (reads should not fail)
- Durability: Pastes must not be lost
2. Estimations
Traffic
- Writes: ~10K/sec
- Reads: ~500K–1M/sec
Storage
Assume:
- Avg paste size = 5KB
- 1B pastes → ~5TB
👉 Requires distributed storage
Read/Write Pattern
- Read-heavy → heavy caching needed
3. API Design
3.1 Create Paste
POST /paste
Request:
{
"content": "text...",
"expiry": "optional",
"visibility": "public/private"
}
Response:
{
"url": "https://pastebin.com/abc123"
}
3.2 Get Paste
GET /{paste_id}
3.3 Delete Paste
DELETE /{paste_id}
4. Data Storage & Design
4.1 Metadata Store
Use:
- Cassandra / DynamoDB
Schema:
paste_id (PK)
created_at
expiry
visibility
user_id
4.2 Content Storage (Important Separation)
Use:
- Object storage (e.g., Amazon S3)
Key:
paste_id → content blob
👉 Reason:
- Large content not suitable for DB
- Cheap, scalable storage
4.3 Cache
Use:
- Redis
Key:
paste_id → content
5. High-Level Architecture (HLD)
System consists of:
- Entry Layer
- Read Path
- Write Path
5.1 Entry Layer
- CDN + Load Balancer
Responsibilities:
- Handle read spikes
- Serve cached pastes
- Protect backend
5.2 Read Path
- Client requests paste
- CDN:
- Hit → return content
- Miss → forward
- Load Balancer → App Server
- App Server:
- Check Redis
- If miss → fetch metadata + content
- Return paste
5.3 Write Path
- Client submits paste
- App server:
- Generate paste_id
- Store metadata in DB
- Store content in object storage
- Cache metadata/content
6. Detailed Breakdown
6.1 Paste ID Generation
Approach: Distributed ID + Base62
Use Snowflake-style:
[timestamp | machine_id | sequence]
Convert to Base62 → short ID
6.2 Read Optimization
- CDN caches popular pastes
- Redis caches hot data
- Most reads avoid DB
6.3 Write Optimization
- Async writes to object storage
- Metadata stored separately
6.4 Stateless Scaling
- App servers stateless
- Scale horizontally
6.5 Hot Key Handling
Problem:
- Viral paste → heavy reads
Solution:
- CDN absorbs traffic
- Redis replication
- Local cache on app servers
7. Additional Considerations
7.1 Large Paste Handling
- Store large pastes in chunks (optional)
- Stream content instead of loading fully
7.2 Expiration & TTL
- Store expiry in DB
- Object storage lifecycle rules auto-delete
7.3 Cache Eviction
- Redis uses LRU/LFU
7.4 Degraded Mode
Redis Down:
- Fetch from DB + object storage
DB Down:
- Serve cached pastes
Object Storage Down:
- Serve cached data if available
7.5 Security
- Rate limiting (prevent spam)
- Content moderation
- Private pastes require auth
🏁 Final Summary
- Separation of metadata & content improves scalability
- CDN + Redis caching ensures low latency
- Object storage handles large data efficiently
- Stateless services ensure horizontal scaling
- TTL + lifecycle policies prevent storage bloat