Requirements
Functional Requirements:
Create a new paste (text/code) with optional metadata:
- visibility: public, unlisted, private
- expires_in: 10 mins, 1 hour, 1 day, never
Retrieve a paste by short URL
- Public/unlisted: accessible without login
- Private: accessible only to authenticated owner
Delete a paste (authenticated only)
- Handle invalid/expired links with clear user feedback
Non-Functional Requirements:
- Availability: 99.9%+
- Low latency:
- Paste retrieval: <100ms P95
- Paste creation: <150ms
- High scalability: Up to 10M pastes/day and 100M views/day
- Security & privacy: Support for visibility control and spam protection
- Data expiry: Efficient TTL-based deletion
- Resilience: Multi-AZ, auto-recovery
API Design
POST /paste
Creates a new paste.
Request
{
"content": "print('Hello')",
"expires_in": 3600,
"syntax": "python",
"visibility": "unlisted"
}
Response
{
"paste_id": "abc123",
"url": "https://paste.io/abc123",
"expires_at": "2025-11-30T10:00:00Z"
}
GET /paste/{paste_id}
Retrive a paste by ID.
Error Responses:
- 404 Not Found (invalid or expired ID)
- 403 Forbidden (private paste, not owner)
DELETE /paste/{paste_id}(Authenticated)
Deletes a user-owned paste (if private or owned).
Optional Future APIs
- GET /user/pastes – fetch pastes by authenticated user
- Get /paste/search?q=... – enable indexed search over public pastes (future)
Authentication & User Accounts
While the MVP allows anonymous paste creation, user account integration enables:
- Managing private pastes
- Listing past pastes (GET /user/pastes)
- Deleting/editing owned pastes
- Auth via OAuth2 or JWT
- owner_id stored in DB, indexed for user lookups
Request Flow
Create Flow:
- Client sends
POST /paste - API Gateway → PasteService
- Service:
- Validates & rate-limits
- Generates
paste_id - Stores in DB
- Adds to Redis (with TTL)
- Returns short URL
Retrieve Flow:
- Client requests
GET /paste/{id} - PasteService checks Redis
- On miss → query DB
- Validates expiration and access
- Returns content or 404/403
High-Level Architecture
[Client]
↓
[API Gateway / Load Balancer]
↓
[Paste Service (Create/Retrieve/Delete)]
↓ ↘
[Redis Cache] [Database Store]
↓ ↓
[TTL Processor] [Backup Storage]
- CDN serves public pastes
- Background workers manage expiry
- Optional auth service added for login flows
Data Model
Relational (PostgreSQL)
Table: pastes
- paste_id (PK)
- content (TEXT)
- syntax (STRING)
- visibility (ENUM)
- owner_id (NULLABLE)
- created_at (TIMESTAMP)
- expires_at (TIMESTAMP)
- access_count (INT)
Indexes:
- paste_id (primary)
- expires_at (for TTL cleanup)
- owner_id + created_at (for listing user pastes)
Alternatives:
- DynamoDB with partition key paste_id
- TTL on expires_at
- Secondary index on owner_id
Capacity Estimation
Assumptions:
- 10M pastes/day
- Avg paste size: 1 KB
- Expiry: 7-day default
- Views: 20× per paste → 200M reads/day
- Cache hit ratio: 90%
Storage:
10M × 1 KB/day = 10 GB/day
7-day retention = 70 GB live data
3x replication → 210 GB
Traffic:
- Writes: 10M/day ≈ 115 QPS
- Reads: 200M/day ≈ 2,300 QPS
- 90% served via Redis → DB load = 230 QPS
- Cache size: top 500K hot pastes × 1 KB = ~500 MB
Scalability
- PasteService is stateless → horizontally scalable (containers/functions)
- Redis Cluster for sharded in-memory TTL caching
- DynamoDB or PostgreSQL with read replicas
- CDN (e.g., CloudFront) for serving public pastes
- Auto-scaling based on CPU/RPS thresholds
Expiry Management & TTL Design
Paste creation:
- Set expires_at timestamp in DB
- Add to Redis with matching TTL (seconds)
Expiry handling:
- Redis auto-evicts on TTL
- Scheduled background job:
- Scans DB for expires_at < NOW()
- Hard deletes or archives
- Batches deletions for efficiency
Detailed Component Design
Paste ID Generation
- Use NanoID or Base62-encoded UUIDv4
- Generates ~6–8 character short, non-sequential IDs
- Retry on collision (rare)
Syntax Highlighting
- Store syntax hint in DB
- Frontend uses Prism.js or highlight.js
- Backend does not parse content
Expiry Management
- Redis: EXPIRE per key
- DB:
Background sweeper: DELETE from pastes WHERE expires_at < now() LIMIT 1000
- Scheduled job runs every N minutes
- Optional: Archive to S3
Caching Strategy
- Redis with TTL (hot pastes)
- LRU eviction + manual pre-warming if analytics available
- CDN for public, non-changing pastes
- Cold reads go directly to DB
Failure Scenarios & Recovery
| FailureRecovery Strategy | |
| DB outage | Use read replicas, failover DB endpoint |
| Redis crash | Use AOF persistence + fallback to DB |
| PasteService failure | Auto-restart + stateless deployment |
| Surge in traffic (DoS) | API Gateway rate-limiting + CAPTCHA |
| Cache invalidation issue | TTL-based purge + on-demand DB lookup |
| Data loss |
Trade-offs & Justifications
| DecisionTrade-off / Justification | |
| Redis for caching | Fast reads vs memory usage; TTL reduces footprint |
| Base62 ID | Short + user-friendly vs potential for collisions |
| PostgreSQL vs DynamoDB | SQL query flexibility vs auto-scaling |
| User auth optional | Simpler MVP vs limited access control / management |
| Public pastes in CDN | Great performance vs cache invalidation complexity |
Future Improvements
- Full-text search (via ElasticSearch or pg_trgm)
- Private paste sharing with tokenized links
- Analytics: view count, last viewed
- Encrypted pastes (optional user keys)
- Paste editing/versioning
- Machine learning for spam/abuse detection
Edge Cases & Data Integrity
- Special characters: Store all pastes as UTF-8 text
- Malformed input: Validate size (<1MB), strip invalid headers
- Paste ID collisions: Retry logic on insert
- Concurrent writes: UUID-based ID avoids clashes
Engagement Answer: Authenticated Paste Access
To allow users to manage their pastes:
- Add owner_id to each paste on creation
- Require JWT on all GET/DELETE /paste/{id} if visibility is private
- GET /user/pastes
- Query by owner_id
- Use cursor-based pagination
- Redis stores visibility & owner metadata for fast auth checks
- This ensures secure yet responsive UX under scale.