Requirements


Functional Requirements:

Create a new paste (text/code) with optional metadata:

  • visibility: public, unlisted, private
  • expires_in: 10 mins, 1 hour, 1 day, never

Retrieve a paste by short URL

  • Public/unlisted: accessible without login
  • Private: accessible only to authenticated owner

Delete a paste (authenticated only)

  • Handle invalid/expired links with clear user feedback


Non-Functional Requirements:

  • Availability: 99.9%+
  • Low latency:
  • Paste retrieval: <100ms P95
  • Paste creation: <150ms
  • High scalability: Up to 10M pastes/day and 100M views/day
  • Security & privacy: Support for visibility control and spam protection
  • Data expiry: Efficient TTL-based deletion
  • Resilience: Multi-AZ, auto-recovery


API Design

POST /paste

Creates a new paste.

Request

{ "content": "print('Hello')", "expires_in": 3600, "syntax": "python", "visibility": "unlisted" }

Response

{ "paste_id": "abc123", "url": "https://paste.io/abc123", "expires_at": "2025-11-30T10:00:00Z" }


GET /paste/{paste_id}

Retrive a paste by ID.

Error Responses:

  • 404 Not Found (invalid or expired ID)
  • 403 Forbidden (private paste, not owner)


DELETE /paste/{paste_id}(Authenticated)

Deletes a user-owned paste (if private or owned).


Optional Future APIs

  • GET /user/pastes – fetch pastes by authenticated user
  • Get /paste/search?q=... – enable indexed search over public pastes (future)



Authentication & User Accounts

While the MVP allows anonymous paste creation, user account integration enables:

  • Managing private pastes
  • Listing past pastes (GET /user/pastes)
  • Deleting/editing owned pastes
  • Auth via OAuth2 or JWT
  • owner_id stored in DB, indexed for user lookups


Request Flow

Create Flow:

  1. Client sends POST /paste
  2. API Gateway → PasteService
  3. Service:
    • Validates & rate-limits
    • Generates paste_id
    • Stores in DB
    • Adds to Redis (with TTL)
    • Returns short URL

Retrieve Flow:

  1. Client requests GET /paste/{id}
  2. PasteService checks Redis
  3. On miss → query DB
  4. Validates expiration and access
  5. Returns content or 404/403



High-Level Architecture

[Client] [API Gateway / Load Balancer] [Paste Service (Create/Retrieve/Delete)] ↓ ↘ [Redis Cache] [Database Store] ↓ ↓ [TTL Processor] [Backup Storage]
  • CDN serves public pastes
  • Background workers manage expiry
  • Optional auth service added for login flows


Data Model

Relational (PostgreSQL)

Table: pastes - paste_id (PK) - content (TEXT) - syntax (STRING) - visibility (ENUM) - owner_id (NULLABLE) - created_at (TIMESTAMP) - expires_at (TIMESTAMP) - access_count (INT)

Indexes:

  • paste_id (primary)
  • expires_at (for TTL cleanup)
  • owner_id + created_at (for listing user pastes)

Alternatives:

  • DynamoDB with partition key paste_id
    • TTL on expires_at
    • Secondary index on owner_id


Capacity Estimation

Assumptions:

  • 10M pastes/day
  • Avg paste size: 1 KB
  • Expiry: 7-day default
  • Views: 20× per paste → 200M reads/day
  • Cache hit ratio: 90%

Storage:

10M × 1 KB/day = 10 GB/day 7-day retention = 70 GB live data 3x replication → 210 GB

Traffic:

  • Writes: 10M/day ≈ 115 QPS
  • Reads: 200M/day ≈ 2,300 QPS
    • 90% served via Redis → DB load = 230 QPS
  • Cache size: top 500K hot pastes × 1 KB = ~500 MB



Scalability

  • PasteService is stateless → horizontally scalable (containers/functions)
  • Redis Cluster for sharded in-memory TTL caching
  • DynamoDB or PostgreSQL with read replicas
  • CDN (e.g., CloudFront) for serving public pastes
  • Auto-scaling based on CPU/RPS thresholds



Expiry Management & TTL Design

Paste creation:

  • Set expires_at timestamp in DB
  • Add to Redis with matching TTL (seconds)

Expiry handling:

  • Redis auto-evicts on TTL
  • Scheduled background job:
    • Scans DB for expires_at < NOW()
    • Hard deletes or archives
    • Batches deletions for efficiency




Detailed Component Design

Paste ID Generation

  • Use NanoID or Base62-encoded UUIDv4
  • Generates ~6–8 character short, non-sequential IDs
  • Retry on collision (rare)

Syntax Highlighting

  • Store syntax hint in DB
  • Frontend uses Prism.js or highlight.js
  • Backend does not parse content

Expiry Management

  • Redis: EXPIRE per key
  • DB:

Background sweeper: DELETE from pastes WHERE expires_at < now() LIMIT 1000

    • Scheduled job runs every N minutes
    • Optional: Archive to S3



Caching Strategy

  • Redis with TTL (hot pastes)
  • LRU eviction + manual pre-warming if analytics available
  • CDN for public, non-changing pastes
  • Cold reads go directly to DB


Failure Scenarios & Recovery

FailureRecovery Strategy
DB outageUse read replicas, failover DB endpoint
Redis crashUse AOF persistence + fallback to DB
PasteService failureAuto-restart + stateless deployment
Surge in traffic (DoS)API Gateway rate-limiting + CAPTCHA
Cache invalidation issueTTL-based purge + on-demand DB lookup
Data loss


Trade-offs & Justifications

DecisionTrade-off / Justification
Redis for cachingFast reads vs memory usage; TTL reduces footprint
Base62 IDShort + user-friendly vs potential for collisions
PostgreSQL vs DynamoDBSQL query flexibility vs auto-scaling
User auth optionalSimpler MVP vs limited access control / management
Public pastes in CDNGreat performance vs cache invalidation complexity


Future Improvements

  • Full-text search (via ElasticSearch or pg_trgm)
  • Private paste sharing with tokenized links
  • Analytics: view count, last viewed
  • Encrypted pastes (optional user keys)
  • Paste editing/versioning
  • Machine learning for spam/abuse detection


Edge Cases & Data Integrity

  • Special characters: Store all pastes as UTF-8 text
  • Malformed input: Validate size (<1MB), strip invalid headers
  • Paste ID collisions: Retry logic on insert
  • Concurrent writes: UUID-based ID avoids clashes


Engagement Answer: Authenticated Paste Access

To allow users to manage their pastes:

  • Add owner_id to each paste on creation
  • Require JWT on all GET/DELETE /paste/{id} if visibility is private
  • GET /user/pastes
    • Query by owner_id
    • Use cursor-based pagination
  • Redis stores visibility & owner metadata for fast auth checks
  • This ensures secure yet responsive UX under scale.