Requirements
Functional Requirements:
- Users can create and store a text/code snippet (paste)
- Each paste has:
- A unique short URL (/abc123)
- Optional expiry time (e.g., 10 mins, 1 hour, 1 day, never)
- Visibility settings: public, unlisted, or private
- Users can retrieve a paste by URL
- Users receive appropriate error messages for expired/missing/unauthorized pastes
- Optional: Support for syntax highlighting
- Optional: Authenticated users can delete or list their pastes
Non-Functional Requirements:
- Availability: 99.9%+
- Low latency:
- Paste retrieval: <100ms P95
- Paste creation: <150ms
- High scalability: Up to 10M pastes/day and 100M views/day
- Security & privacy: Support for visibility control and spam protection
- Data expiry: Efficient TTL-based deletion
- Resilience: Multi-AZ, auto-recovery
API Design
POST /paste
Creates a new paste.
Request
{
"content": "print('Hello')",
"expires_in": 3600,
"syntax": "python",
"visibility": "unlisted"
}
Response
{
"paste_id": "abc123",
"url": "https://paste.io/abc123",
"expires_at": "2025-11-30T10:00:00Z"
}
GET /paste/{paste_id}
Retrive a paste by ID.
Error Responses:
- 404 Not Found (invalid or expired ID)
- 403 Forbidden (private paste, not owner)
DELETE /paste/{paste_id}(Authenticated)
Deletes a user-owned paste (if private or owned).
Optional Future APIs
- GET /user/pastes – fetch pastes by authenticated user
- Get /paste/search?q=... – enable indexed search over public pastes (future)
High-Level Architecture
[Client]
↓
[API Gateway / Load Balancer]
↓
[Paste Service (Create/Retrieve/Delete)]
↓ ↘
[Redis Cache] [Database Store]
↓ ↓
[TTL Processor] [Backup Storage]
- CDN can optionally cache public pastes
Data Model
Table: pastes
- paste_id (PK)
- content (TEXT)
- syntax (STRING)
- visibility (ENUM)
- owner_id (NULLABLE)
- created_at (TIMESTAMP)
- expires_at (TIMESTAMP)
- access_count (INT)
Indexes:
- paste_id (primary)
- expires_at (for TTL cleanup)
- owner_id + created_at (for listing user pastes)
Alternatives:
- DynamoDB with partition key paste_id
- TTL on expires_at
- Secondary index on owner_id
Capacity Estimation
Assumptions:
- 10M pastes/day
- Avg paste size: 1 KB
- Expiry: 7-day default
- Views: 20× per paste → 200M reads/day
- Cache hit ratio: 90%
Storage:
10M × 1 KB/day = 10 GB/day
7-day retention = 70 GB live data
3x replication → 210 GB
Traffic:
- Writes: 10M/day ≈ 115 QPS
- Reads: 200M/day ≈ 2,300 QPS
- 90% served via Redis → DB load = 230 QPS
- Cache size: top 500K hot pastes × 1 KB = ~500 MB
Scalability and Performance
Application Layer:
- Stateless microservices for create/read/delete
- Scales horizontally behind load balancer (ECS / Kubernetes / Lambda)
Caching:
- Redis Cluster for fast read access
- TTL-based eviction + LRU fallback
Database:
- PostgreSQL with read replicas (or)
- DynamoDB with on-demand scaling + Global Tables
CDN:
- Serve public pastes (static content) via CloudFront/Cloudflare
Expiry Management & TTL Design
Paste creation:
- Set expires_at timestamp in DB
- Add to Redis with matching TTL (seconds)
Expiry handling:
- Redis auto-evicts on TTL
- Scheduled background job:
- Scans DB for expires_at < NOW()
- Hard deletes or archives
- Batches deletions for efficiency
Detailed Component Design
Paste ID Generation
- Base62-encoded UUID or NanoID (6–8 chars)
- Non-sequential to prevent enumeration
- Check for collisions if using a short random ID
Retrieval Flow
- API receives
GET /paste/abc123 - Check Redis for
paste_id - If miss → Query DB → Store in Redis
- Enforce visibility rules (
private,unlisted) - Return response or error
Deletion Flow
- Auth required
- Validate ownership
- Remove from Redis and DB
Failure Scenarios & Recovery
| FailureRecovery Strategy | |
| DB outage | Use read replicas, failover DB endpoint |
| Redis crash | Use AOF persistence + fallback to DB |
| PasteService failure | Auto-restart + stateless deployment |
| Surge in traffic (DoS) | API Gateway rate-limiting + CAPTCHA |
| Cache invalidation issue | TTL-based purge + on-demand DB lookup |
| Data loss |
Trade-offs & Justifications
| DecisionTrade-off / Justification | |
| Redis for caching | Fast reads vs memory usage; TTL reduces footprint |
| Base62 ID | Short + user-friendly vs potential for collisions |
| PostgreSQL vs DynamoDB | SQL query flexibility vs auto-scaling |
| User auth optional | Simpler MVP vs limited access control / management |
| Public pastes in CDN | Great performance vs cache invalidation complexity |
Future Improvements
- Full-text search for public pastes (e.g., ElasticSearch)
- User account integration and paste history
- Analytics: View count per paste, last accessed
- Encrypted pastes (client-side)
- Spam detection: ML model to flag abusive pastes
- Geographic replication for latency-sensitive traffic
- Tagging system and public paste feed
Engagement Scenario: Authenticated User Access
If we introduce user accounts, how would users retrieve their pastes efficiently?
Solution:
- Store owner_id with each paste
- Add composite index: (owner_id, created_at_DESC)
- API: GET /user/pastes with pagination
- Use cursor-based pagination for scalability
- Protect access with session tokens or OAuth2