Requirements


Functional Requirements:

  • Users can create and store a text/code snippet (paste)
  • Each paste has:
    • A unique short URL (/abc123)
    • Optional expiry time (e.g., 10 mins, 1 hour, 1 day, never)
    • Visibility settings: public, unlisted, or private
  • Users can retrieve a paste by URL
  • Users receive appropriate error messages for expired/missing/unauthorized pastes
  • Optional: Support for syntax highlighting
  • Optional: Authenticated users can delete or list their pastes

Non-Functional Requirements:

  • Availability: 99.9%+
  • Low latency:
  • Paste retrieval: <100ms P95
  • Paste creation: <150ms
  • High scalability: Up to 10M pastes/day and 100M views/day
  • Security & privacy: Support for visibility control and spam protection
  • Data expiry: Efficient TTL-based deletion
  • Resilience: Multi-AZ, auto-recovery


API Design

POST /paste

Creates a new paste.

Request

{ "content": "print('Hello')", "expires_in": 3600, "syntax": "python", "visibility": "unlisted" }

Response

{ "paste_id": "abc123", "url": "https://paste.io/abc123", "expires_at": "2025-11-30T10:00:00Z" }


GET /paste/{paste_id}

Retrive a paste by ID.

Error Responses:

  • 404 Not Found (invalid or expired ID)
  • 403 Forbidden (private paste, not owner)


DELETE /paste/{paste_id}(Authenticated)

Deletes a user-owned paste (if private or owned).


Optional Future APIs

  • GET /user/pastes – fetch pastes by authenticated user
  • Get /paste/search?q=... – enable indexed search over public pastes (future)



High-Level Architecture

[Client] [API Gateway / Load Balancer] [Paste Service (Create/Retrieve/Delete)] ↓ ↘ [Redis Cache] [Database Store] ↓ ↓ [TTL Processor] [Backup Storage]
  • CDN can optionally cache public pastes


Data Model

Table: pastes - paste_id (PK) - content (TEXT) - syntax (STRING) - visibility (ENUM) - owner_id (NULLABLE) - created_at (TIMESTAMP) - expires_at (TIMESTAMP) - access_count (INT)

Indexes:

  • paste_id (primary)
  • expires_at (for TTL cleanup)
  • owner_id + created_at (for listing user pastes)

Alternatives:

  • DynamoDB with partition key paste_id
    • TTL on expires_at
    • Secondary index on owner_id


Capacity Estimation

Assumptions:

  • 10M pastes/day
  • Avg paste size: 1 KB
  • Expiry: 7-day default
  • Views: 20× per paste → 200M reads/day
  • Cache hit ratio: 90%

Storage:

10M × 1 KB/day = 10 GB/day 7-day retention = 70 GB live data 3x replication → 210 GB

Traffic:

  • Writes: 10M/day ≈ 115 QPS
  • Reads: 200M/day ≈ 2,300 QPS
    • 90% served via Redis → DB load = 230 QPS
  • Cache size: top 500K hot pastes × 1 KB = ~500 MB



Scalability and Performance

Application Layer:

  • Stateless microservices for create/read/delete
  • Scales horizontally behind load balancer (ECS / Kubernetes / Lambda)

Caching:

  • Redis Cluster for fast read access
  • TTL-based eviction + LRU fallback

Database:

  • PostgreSQL with read replicas (or)
  • DynamoDB with on-demand scaling + Global Tables

CDN:

  • Serve public pastes (static content) via CloudFront/Cloudflare



Expiry Management & TTL Design

Paste creation:

  • Set expires_at timestamp in DB
  • Add to Redis with matching TTL (seconds)

Expiry handling:

  • Redis auto-evicts on TTL
  • Scheduled background job:
    • Scans DB for expires_at < NOW()
    • Hard deletes or archives
    • Batches deletions for efficiency




Detailed Component Design

Paste ID Generation

  • Base62-encoded UUID or NanoID (6–8 chars)
  • Non-sequential to prevent enumeration
  • Check for collisions if using a short random ID

Retrieval Flow

  1. API receives GET /paste/abc123
  2. Check Redis for paste_id
  3. If miss → Query DB → Store in Redis
  4. Enforce visibility rules (private, unlisted)
  5. Return response or error

Deletion Flow

  • Auth required
  • Validate ownership
  • Remove from Redis and DB


Failure Scenarios & Recovery

FailureRecovery Strategy
DB outageUse read replicas, failover DB endpoint
Redis crashUse AOF persistence + fallback to DB
PasteService failureAuto-restart + stateless deployment
Surge in traffic (DoS)API Gateway rate-limiting + CAPTCHA
Cache invalidation issueTTL-based purge + on-demand DB lookup
Data loss


Trade-offs & Justifications

DecisionTrade-off / Justification
Redis for cachingFast reads vs memory usage; TTL reduces footprint
Base62 IDShort + user-friendly vs potential for collisions
PostgreSQL vs DynamoDBSQL query flexibility vs auto-scaling
User auth optionalSimpler MVP vs limited access control / management
Public pastes in CDNGreat performance vs cache invalidation complexity


Future Improvements

  • Full-text search for public pastes (e.g., ElasticSearch)
  • User account integration and paste history
  • Analytics: View count per paste, last accessed
  • Encrypted pastes (client-side)
  • Spam detection: ML model to flag abusive pastes
  • Geographic replication for latency-sensitive traffic
  • Tagging system and public paste feed


Engagement Scenario: Authenticated User Access

If we introduce user accounts, how would users retrieve their pastes efficiently?

Solution:

  • Store owner_id with each paste
  • Add composite index: (owner_id, created_at_DESC)
  • API: GET /user/pastes with pagination
  • Use cursor-based pagination for scalability
  • Protect access with session tokens or OAuth2