Retry policy in CompletionService

Java

CompletionService

Retry Policy

Concurrent Programming

Multithreading

Retry policy in CompletionService

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

CompletionService is useful when you want to submit many tasks and consume results as they finish instead of waiting in submission order. Adding retries to that pattern is possible, but the retry logic should be designed explicitly rather than bolted onto Future.get() calls. The key is to treat retryable failure as part of task orchestration, not as an afterthought in the result-reading loop.

Core Sections

What `CompletionService` Actually Gives You

CompletionService combines an executor with a completion queue. You submit tasks, and completed futures become available in completion order.

Typical setup:

java

1import java.util.concurrent.*;
2
3ExecutorService pool = Executors.newFixedThreadPool(4);
4CompletionService<String> completion = new ExecutorCompletionService<>(pool);

This is useful for workloads where some tasks finish quickly and you want to start handling them immediately.

Where Retry Logic Should Live

The cleanest design is usually to wrap the task itself with retry behavior. That way each submitted unit either:

succeeds within the retry policy
fails permanently after exhausting retries

A simple retrying Callable:

java

1import java.util.concurrent.Callable;
2
3class RetryingTask implements Callable<String> {
4    private final int maxAttempts;
5
6    RetryingTask(int maxAttempts) {
7        this.maxAttempts = maxAttempts;
8    }
9
10    @Override
11    public String call() throws Exception {
12        int attempt = 0;
13        while (true) {
14            attempt++;
15            try {
16                return doWork(attempt);
17            } catch (RuntimeException ex) {
18                if (attempt >= maxAttempts) {
19                    throw ex;
20                }
21                Thread.sleep(200L * attempt);
22            }
23        }
24    }
25
26    private String doWork(int attempt) {
27        if (attempt < 3) {
28            throw new RuntimeException("temporary failure");
29        }
30        return "success on attempt " + attempt;
31    }
32}

This keeps retry policy close to the work being retried.

Submitting Tasks and Reading Results

Use the completion service normally:

java

1import java.util.concurrent.*;
2
3ExecutorService pool = Executors.newFixedThreadPool(4);
4CompletionService<String> completion = new ExecutorCompletionService<>(pool);
5
6for (int i = 0; i < 5; i++) {
7    completion.submit(new RetryingTask(3));
8}
9
10for (int i = 0; i < 5; i++) {
11    Future<String> future = completion.take();
12    try {
13        System.out.println(future.get());
14    } catch (ExecutionException ex) {
15        System.err.println("Task failed permanently: " + ex.getCause());
16    }
17}
18
19pool.shutdown();

Tasks that succeed after retries appear just like normal successful tasks. Tasks that exhaust retries surface as failed futures.

Retry Outside the Task When Needed

Sometimes you want central orchestration rather than embedding retry logic in each task. For example, you may want to inspect the exception type and resubmit only certain failures.

A simplified idea:

submit original callable
consume completed future
if failure is retryable and retry budget remains, submit a new callable
otherwise record permanent failure

This approach is more flexible, but you must track outstanding task count carefully so the result loop knows when it is really done.

Distinguish Retryable and Permanent Failures

Not every exception should be retried. Good candidates:

transient network timeout
temporary remote throttling
short-lived database connectivity issue

Bad candidates:

validation errors
malformed requests
deterministic logic bugs

A retry policy without exception classification often amplifies failure instead of improving resilience.

Add Backoff and Limits

Retries without delay can overload the failing dependency. Add:

maximum attempts
delay or exponential backoff
optional jitter

For example, backoff based on attempt number:

java

long delayMs = 200L * attempt;
Thread.sleep(delayMs);

In real systems, jitter helps avoid synchronized retry storms across many worker threads.

Cancellation and Shutdown Semantics

If the executor is shutting down or the calling thread is interrupted, honor interruption correctly. A retry loop that ignores interruption can hang application shutdown.

That means:

java

1catch (InterruptedException ex) {
2    Thread.currentThread().interrupt();
3    throw ex;
4}

Do not silently swallow interruptions inside retry code.

When a Library Is Better

If retry policy becomes complex, such as needing jitter, circuit breaking, metrics, and exception classification, consider using a resilience library instead of hand-writing everything. CompletionService still coordinates the concurrency, but resilience policy can be delegated to a library layer.

Common Pitfalls

Retrying every exception instead of only retryable failures.
Adding retries in the result loop without tracking resubmitted task counts correctly.
Omitting backoff and creating retry storms under load.
Ignoring interruption and making executor shutdown unreliable.
Hiding permanent failures by retrying until the system times out elsewhere.

Summary

'CompletionService is good for consuming task results in completion order.'
The simplest retry design is often a retrying Callable submitted to the service.
Retries need limits, backoff, and exception classification.
Distinguish transient failures from permanent ones before resubmitting work.
Treat shutdown and interruption as first-class parts of the retry design.

Retry policy in CompletionService

Master System Design with Codemia

Introduction

Core Sections

What CompletionService Actually Gives You

Where Retry Logic Should Live

Submitting Tasks and Reading Results

Retry Outside the Task When Needed

Distinguish Retryable and Permanent Failures

Add Backoff and Limits

Cancellation and Shutdown Semantics

When a Library Is Better

Common Pitfalls

Summary

What `CompletionService` Actually Gives You