Design a Model Routing and Fallback System

Last updated: May 21, 2025

Quick Overview

Design a system that routes LLM requests to different models based on task type, latency requirements, and cost. Handle model failures with automatic fallback and maintain consistent quality across routing decisions.

Cursor
System Design
Software Engineer
Cursor
May 21, 2025
Software Engineer
Onsite - System Design
System Design
Medium

7

6

4,627 solved


Design a system that routes LLM requests to different models based on task type, latency requirements, and cost. Handle model failures with automatic fallback and maintain consistent quality across routing decisions.

Cursor uses multiple models for different features. Tab completion needs a fast, small model. Chat needs a larger, more capable model. Some tasks benefit from frontier models while others are fine with cheaper alternatives. This question tests your ability to design an intelligent routing layer that optimizes cost, latency, and quality.

What the Interviewer Expects
  • Design a routing layer that selects models based on task characteristics
  • Implement fallback chains for model failures or timeouts
  • Handle cost optimization across different model tiers
  • Maintain quality monitoring per route
  • Support A/B testing of different routing configurations
Key Topics to Cover
Model routing and load balancing
Fallback and resilience patterns
Cost optimization in ML systems
Quality monitoring
A/B testing infrastructure
How to Approach This
  1. Start by clarifying functional and non-functional requirements with the interviewer.
  2. Estimate the scale: QPS, storage, bandwidth. This drives your design decisions.
  3. Draw a high-level architecture first, then deep dive into 1-2 critical components.
  4. Discuss trade-offs explicitly (e.g., consistency vs availability, SQL vs NoSQL).
  5. Address failure scenarios, monitoring, and how the system handles 10x traffic spikes.
Possible Follow-up Questions
  • How would you handle a scenario where all models in a fallback chain are down?
  • How would you route requests for a brand new feature with no quality data?
  • How would you handle model deprecation and migration?
Sharpen Your Skills on Codemia

Practice similar problems with our interactive workspace, get AI feedback, and track your progress.

Practice System Design Problems
Sample Answer
Routing Architecture

The router sits between the application layer and model providers. Each request includes metadata: task type (completion, chat, edit, agent), latency ...

Fallback and Resilience

Each route defines a fallback chain: primary model, secondary model, and degraded mode. If the primary model times out or returns an error, the reques...


Submit Your Answer
Markdown supported

Related Questions