Dealing with high number of real-time calls to partner API in Rails
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When a Rails app makes many real time calls to a partner API, latency and rate limits become product reliability issues. The solution is not one trick but a layered strategy: cache where possible, queue non critical calls, protect upstream with limits, and degrade gracefully during partner outages. This guide focuses on practical patterns you can deploy incrementally.
Classify Calls by Urgency
Start by separating calls into two groups.
- user blocking calls that must finish before response
- asynchronous calls that can run after response
Only truly blocking calls should remain in the request path.
Everything else should move to background jobs so web threads stay available.
Add Caching and Request Coalescing
Repeated requests for same partner data should hit cache first. Short TTL caches reduce partner load significantly.
For burst traffic, add request coalescing so many identical in flight calls collapse into one upstream call.
Use Background Jobs for Non Critical Calls
Sidekiq or Active Job can process partner updates outside user response cycle.
Controller path:
This pattern protects p95 web latency during partner slowdowns.
Rate Limiting, Retries, and Circuit Breakers
Respect partner rate limits proactively. Use a token bucket or semaphore style guard before issuing outbound requests.
Wrap HTTP calls with timeout and retry rules.
Add a circuit breaker to fail fast when partner error rates spike.
Observability and Backpressure
Instrument outbound calls with logs and metrics.
- request rate and success ratio
- timeout count and retry count
- p50 and p95 latency
- cache hit ratio
Emit partner call metadata including endpoint, status code, and correlation id. During incidents, this is often the difference between fast mitigation and blind guessing.
When queues back up, apply backpressure by rejecting low priority work or extending cache TTL temporarily.
Common Pitfalls
A common pitfall is retrying every failure immediately, which amplifies partner outages and can trigger bans.
Another issue is putting all partner calls in the request path and then scaling web workers endlessly. That increases cost without fixing upstream dependency limits.
A third issue is ignoring stale data strategy. For many use cases, slightly stale cached data is better than total failure.
Finally, avoid global rescue blocks that hide outbound failure details. Keep structured error classes and clear fallback responses.
Summary
- Classify partner calls by urgency and minimize user blocking paths
- Cache aggressively for repeated reads and coalesce duplicate requests
- Move non critical calls to background jobs with controlled retries
- Enforce outbound timeouts, concurrency limits, and circuit breaking
- Measure latency, error rates, and queue health to manage load safely

