Dealing with high number of real-time calls to partner API in Rails

Rails

API Integration

Real-time Calls

Partner API

Scalability

Dealing with high number of real-time calls to partner API in Rails

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

When a Rails app makes many real time calls to a partner API, latency and rate limits become product reliability issues. The solution is not one trick but a layered strategy: cache where possible, queue non critical calls, protect upstream with limits, and degrade gracefully during partner outages. This guide focuses on practical patterns you can deploy incrementally.

Classify Calls by Urgency

Start by separating calls into two groups.

user blocking calls that must finish before response
asynchronous calls that can run after response

Only truly blocking calls should remain in the request path.

ruby

1# controller example
2class QuotesController < ApplicationController
3  def show
4    quote = PartnerQuoteService.fetch!(params[:symbol])
5    render json: quote
6  end
7end

Everything else should move to background jobs so web threads stay available.

Add Caching and Request Coalescing

Repeated requests for same partner data should hit cache first. Short TTL caches reduce partner load significantly.

ruby

1class PartnerQuoteService
2  CACHE_TTL = 15.seconds
3
4  def self.fetch!(symbol)
5    Rails.cache.fetch("partner:quote:#{symbol}", expires_in: CACHE_TTL) do
6      PartnerClient.get_quote(symbol)
7    end
8  end
9end

For burst traffic, add request coalescing so many identical in flight calls collapse into one upstream call.

Use Background Jobs for Non Critical Calls

Sidekiq or Active Job can process partner updates outside user response cycle.

ruby

1class PartnerSyncJob < ApplicationJob
2  queue_as :partner_api
3
4  retry_on StandardError, wait: :exponentially_longer, attempts: 5
5
6  def perform(account_id)
7    PartnerClient.sync_account(account_id)
8  end
9end

Controller path:

ruby

PartnerSyncJob.perform_later(current_user.account_id)
render json: { status: 'accepted' }, status: :accepted

This pattern protects p95 web latency during partner slowdowns.

Rate Limiting, Retries, and Circuit Breakers

Respect partner rate limits proactively. Use a token bucket or semaphore style guard before issuing outbound requests.

ruby

1class OutboundLimiter
2  LIMIT = 20
3
4  def self.with_slot
5    key = "partner:inflight"
6    inflight = Redis.current.incr(key)
7
8    if inflight > LIMIT
9      Redis.current.decr(key)
10      raise "partner_concurrency_limit"
11    end
12
13    yield
14  ensure
15    Redis.current.decr(key) rescue nil
16  end
17end

Wrap HTTP calls with timeout and retry rules.

ruby

1conn = Faraday.new(url: ENV.fetch('PARTNER_API_URL')) do |f|
2  f.request :retry, max: 2, interval: 0.1, interval_randomness: 0.2
3  f.adapter Faraday.default_adapter
4end
5
6response = OutboundLimiter.with_slot do
7  conn.get('/v1/orders', { id: order_id }) do |req|
8    req.options.timeout = 2
9    req.options.open_timeout = 1
10  end
11end

Add a circuit breaker to fail fast when partner error rates spike.

Observability and Backpressure

Instrument outbound calls with logs and metrics.

request rate and success ratio
timeout count and retry count
p50 and p95 latency
cache hit ratio

Emit partner call metadata including endpoint, status code, and correlation id. During incidents, this is often the difference between fast mitigation and blind guessing.

When queues back up, apply backpressure by rejecting low priority work or extending cache TTL temporarily.

Common Pitfalls

A common pitfall is retrying every failure immediately, which amplifies partner outages and can trigger bans.

Another issue is putting all partner calls in the request path and then scaling web workers endlessly. That increases cost without fixing upstream dependency limits.

A third issue is ignoring stale data strategy. For many use cases, slightly stale cached data is better than total failure.

Finally, avoid global rescue blocks that hide outbound failure details. Keep structured error classes and clear fallback responses.

Summary

Classify partner calls by urgency and minimize user blocking paths
Cache aggressively for repeated reads and coalesce duplicate requests
Move non critical calls to background jobs with controlled retries
Enforce outbound timeouts, concurrency limits, and circuit breaking
Measure latency, error rates, and queue health to manage load safely