Requirements


Functional Requirements:

  • Users can create a paste containing plain text or code.
  • Each paste has:
    • A unique short URL (e.g., paste.io/abc123)
    • Optional expiration time (e.g., 10 mins, 1 hour, never)
    • Optional visibility: public, unlisted, private
  • Users can retrieve a paste via its short URL.
  • Optional: Syntax highlighting for code pastes.

Non-Functional Requirements:

  • High Availability (99.9% uptime)
  • Low Latency for paste retrieval (<100ms P95)
  • Scalability to support millions of pastes/day
  • Data Expiry support (auto-deletion via TTL)
  • Abuse Protection (spam, link flooding)
  • Security (unlisted/private visibility, DoS protection)


API Design


  • POST /paste
    • Create a new paste.

Request

{ "content": "print('Hello')", "expires_in": 3600, "syntax": "python", "visibility": "unlisted" }

Response

{ "paste_id": "abc123", "url": "https://paste.io/abc123", "expires_at": "2025-11-30T10:00:00Z" }
  • GET /paste/{paste_id}
    • Retrieve a paste by ID.

Response:

{ "content": "print('Hello')", "syntax": "python", "created_at": "2025-11-30T09:00:00Z", "expires_at": "2025-11-30T10:00:00Z" }


High-Level Architecture

[Client] [API Gateway / Load Balancer] [Paste Service (Create / Retrieve)] ↓ ↘ [Redis Cache] [Database] ↑ ↓ [TTL Cleaner Job / Expiry Processor]


Data Model

Table: pastes - paste_id (PK, string) - content (text) - created_at (timestamp) - expires_at (timestamp) - syntax (string) - visibility (enum: public, unlisted, private) - owner_id (nullable, for logged-in users)

Indexes:

  • paste_id (for lookup)
  • expires_at (for TTL cleanup)
  • owner_id (for user-based queries)





Detailed Component Design

Paste ID Generation

  • Use Base62-encoded UUID or NanoID to generate short, unique IDs (e.g., 6–8 characters).
  • Ensures non-guessable, non-sequential URLs.

Paste Creation Flow

  1. User submits POST /paste
  2. Service generates paste_id, normalizes TTL
  3. Store paste in database (PostgreSQL or DynamoDB)
  4. Add entry to Redis (TTL = expires_in)
  5. Return short URL

Paste Retrieval Flow

  1. Client hits GET /paste/{paste_id}
  2. Check Redis cache
  3. If miss → read from DB → populate Redis
  4. Return paste content

Expiry Management

  • Redis handles TTL-based eviction for short-lived pastes.
  • Periodic background job runs to:
    • Query DB for expires_at < now()
    • Delete expired records
    • Optional: archive expired pastes to cold storage (S3)

Abuse Protection

  • Rate limit: max 10 pastes/min/IP (via API Gateway or Redis bucket)
  • Paste size limit: 1MB max
  • Visibility control: private/unlisted pastes not indexable
  • Optional: add CAPTCHA or email verification for anonymous users

Capacity Estimation

Assumptions:

  • 10M pastes/day
  • Avg paste size = 1 KB
  • Paste TTL = 7 days average

Storage:

  • 10M × 1 KB = ~10 GB/day
  • Retention = 7 days → 70 GB active data
  • Replication overhead (×3) → ~210 GB

Traffic:

  • Paste creation: ~120 QPS
  • Paste access: ~1,200 QPS
  • Redis hit ratio: 90%
  • DB reads = ~120 QPS, DB writes = ~120 QPS


Scalability and Fault Tolerance

  • Horizontal scaling of Paste API service behind a load balancer
  • Redis Cluster for sharded caching
  • PostgreSQL read replicas (or DynamoDB with Global Tables)
  • API stateless → scales via ECS/Lambda/K8s
  • CDN in front of API to cache popular pastes
  • Health checks and failover for multi-AZ resilience



Trade-Offs and Alternatives

DecisionAlternatives Considered
Use Base62 short IDvs sequential ID (less secure, guessable)
Redis TTL cachingvs edge CDN (CDN better for static public pastes)
PostgreSQL for DBvs DynamoDB (less query flexibility)
Soft deletesvs hard deletes for expiry cleanup



Optional Features

  • Syntax highlighting (Prism.js on frontend)
  • User login / paste history
  • Paste search (if public)
  • Password-protected pastes
  • GPDR-compliant deletion / export