Change number of threads for Tensorflow inference with C API

Tensorflow

C API

threading

inference

performance optimization

Change number of threads for Tensorflow inference with C API

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

With the TensorFlow C API, CPU thread settings must be decided before the session is created. If inference is already running, you generally cannot dial the thread count up or down on the existing session and expect TensorFlow to rebuild its execution pools in place.

What Thread Counts Mean

TensorFlow uses two related CPU settings:

intra-op threads for parallel work inside one operation such as matrix multiplication
inter-op threads for running independent operations at the same time

If you set these too low, inference may underuse the machine. If you set them too high, contention and context switching can hurt latency.

Practical Option: Configure Before Session Creation

The C API exposes TF_SetConfig on TF_SessionOptions. Under the hood, that function expects a serialized TensorFlow config proto. In real C applications, the important rule is that the configuration must be applied before creating or loading the session.

A minimal C example can set environment variables before building the session. This is easy to test and keeps the example runnable:

1#include <stdio.h>
2#include <stdlib.h>
3#include <tensorflow/c/c_api.h>
4
5int main(void) {
6    setenv("TF_NUM_INTRAOP_THREADS", "2", 1);
7    setenv("TF_NUM_INTEROP_THREADS", "1", 1);
8
9    TF_Status *status = TF_NewStatus();
10    TF_SessionOptions *opts = TF_NewSessionOptions();
11
12    TF_Graph *graph = TF_NewGraph();
13    TF_Buffer *run_opts = NULL;
14    TF_Buffer *meta_graph = TF_NewBuffer();
15    const char *tags[] = {"serve"};
16
17    TF_Session *session = TF_LoadSessionFromSavedModel(
18        opts,
19        run_opts,
20        "./saved_model",
21        tags,
22        1,
23        graph,
24        meta_graph,
25        status
26    );
27
28    if (TF_GetCode(status) != TF_OK) {
29        fprintf(stderr, "load failed: %s\n", TF_Message(status));
30        return 1;
31    }
32
33    printf("session loaded\n");
34
35    TF_DeleteSession(session, status);
36    TF_DeleteGraph(graph);
37    TF_DeleteBuffer(meta_graph);
38    TF_DeleteSessionOptions(opts);
39    TF_DeleteStatus(status);
40    return 0;
41}

The key detail is placement: set thread-related configuration before TF_LoadSessionFromSavedModel or before creating a graph session.

Per-Session Control With `TF_SetConfig`

If you need explicit per-session control instead of process-level configuration, use TF_SetConfig with a serialized ConfigProto. That is the official C API hook.

The tradeoff is that you need protobuf bytes for fields such as intra_op_parallelism_threads and inter_op_parallelism_threads. Many teams generate that proto in a higher-level language or with TensorFlow protobuf definitions during the build, then pass the serialized bytes to TF_SetConfig.

Conceptually the flow is:

create TF_SessionOptions
serialize a config proto with the thread settings
call TF_SetConfig
create or load the session

If you call TF_SetConfig after the session exists, it is too late for that session.

How to Tune the Values

There is no universal best pair of numbers. A good starting point for CPU inference is:

try a small inter-op value such as 1 or 2
vary intra-op threads around the number of physical cores available to the process
measure throughput and latency instead of guessing

For single-request latency, fewer threads can sometimes be faster because they reduce scheduling overhead. For batched throughput, a higher setting may help.

Common Pitfalls

The most common mistake is changing thread settings after the session has already been created. TensorFlow thread pools are typically decided earlier than that.

Another issue is tuning on logical CPU count alone. Hyperthreaded cores do not always behave like fully independent cores for inference workloads.

A third problem is benchmarking without fixing the workload. Thread settings that improve large-batch throughput may hurt single-request latency.

Summary

Set TensorFlow CPU thread counts before creating or loading the session.
'intra-op controls parallelism inside an op; inter-op controls parallelism across ops.'
The C API hook for session configuration is TF_SetConfig.
A practical runnable approach is to set thread-related environment variables before session creation.
Tune with measurements, because the best thread counts depend on model shape and workload.

Change number of threads for Tensorflow inference with C API

Master System Design with Codemia

Introduction

What Thread Counts Mean

Practical Option: Configure Before Session Creation

Per-Session Control With TF_SetConfig

How to Tune the Values

Common Pitfalls

Summary

Per-Session Control With `TF_SetConfig`