Running trained tensorflow model in C

TensorFlow

C++

Machine Learning

Model Deployment

Programming

Running trained tensorflow model in C

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Running a trained TensorFlow model from C is possible, but the deployment format matters a lot. If you need plain C inference code, the most practical path is usually TensorFlow Lite with its C API. The full TensorFlow C library is also available, but it is heavier and better suited to cases where you truly need the full runtime.

Choose the Runtime Before Writing C Code

A model trained in Python is not automatically ready to run in C. You normally export it into one of two forms:

a SavedModel for the full TensorFlow runtime
a .tflite file for TensorFlow Lite inference

For embedded, desktop, and mobile inference, TensorFlow Lite is often the easier option because its API surface is smaller and it is designed for running trained models rather than for full training workflows.

Convert the Model to TensorFlow Lite

A simple Keras-to-TFLite conversion looks like this:

python

1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Input(shape=(4,)),
5    tf.keras.layers.Dense(8, activation="relu"),
6    tf.keras.layers.Dense(1),
7])
8
9converter = tf.lite.TFLiteConverter.from_keras_model(model)
10tflite_model = converter.convert()
11
12with open("model.tflite", "wb") as f:
13    f.write(tflite_model)

That .tflite file is what your C program will load.

Minimal TensorFlow Lite C Inference Flow

A typical inference path is:

load the model file
create interpreter options
create an interpreter
allocate tensors
copy input data into the input tensor
invoke inference
read the output tensor

Example in C:

1#include <stdio.h>
2#include "tensorflow/lite/c/c_api.h"
3
4int main(void) {
5    TfLiteModel *model = TfLiteModelCreateFromFile("model.tflite");
6    if (!model) {
7        fprintf(stderr, "Failed to load model\n");
8        return 1;
9    }
10
11    TfLiteInterpreterOptions *options = TfLiteInterpreterOptionsCreate();
12    TfLiteInterpreterOptionsSetNumThreads(options, 2);
13
14    TfLiteInterpreter *interpreter = TfLiteInterpreterCreate(model, options);
15    if (!interpreter) {
16        fprintf(stderr, "Failed to create interpreter\n");
17        TfLiteInterpreterOptionsDelete(options);
18        TfLiteModelDelete(model);
19        return 1;
20    }
21
22    if (TfLiteInterpreterAllocateTensors(interpreter) != kTfLiteOk) {
23        fprintf(stderr, "Failed to allocate tensors\n");
24        TfLiteInterpreterDelete(interpreter);
25        TfLiteInterpreterOptionsDelete(options);
26        TfLiteModelDelete(model);
27        return 1;
28    }
29
30    float input_data[4] = {1.0f, 2.0f, 3.0f, 4.0f};
31    TfLiteTensor *input = TfLiteInterpreterGetInputTensor(interpreter, 0);
32    TfLiteTensorCopyFromBuffer(input, input_data, sizeof(input_data));
33
34    if (TfLiteInterpreterInvoke(interpreter) != kTfLiteOk) {
35        fprintf(stderr, "Inference failed\n");
36        TfLiteInterpreterDelete(interpreter);
37        TfLiteInterpreterOptionsDelete(options);
38        TfLiteModelDelete(model);
39        return 1;
40    }
41
42    const TfLiteTensor *output = TfLiteInterpreterGetOutputTensor(interpreter, 0);
43    float output_data[1];
44    TfLiteTensorCopyToBuffer(output, output_data, sizeof(output_data));
45
46    printf("Prediction: %f\n", output_data[0]);
47
48    TfLiteInterpreterDelete(interpreter);
49    TfLiteInterpreterOptionsDelete(options);
50    TfLiteModelDelete(model);
51    return 0;
52}

This is a real inference path. The input and output buffer sizes must match the model's tensor shapes exactly.

Building the Program

Your compiler needs TensorFlow Lite headers and libraries. The exact command depends on your installation path, but the structure is ordinary C compilation with include and library flags:

bash

gcc main.c -I/path/to/include -L/path/to/lib -ltensorflowlite_c -o run_model

Deployment is not just a code problem. You also need the correct runtime libraries on the target machine.

When the Full TensorFlow C API Is Better

If you need to load a SavedModel and use the full runtime rather than a lightweight inference engine, the TensorFlow C API is the other option. It is more flexible, but it is also more verbose than TensorFlow Lite for straightforward inference.

A good rule of thumb is:

use TensorFlow Lite for lightweight C inference
use the full TensorFlow runtime only when the model or runtime requirements demand it

That keeps the integration smaller and easier to maintain.

Common Pitfalls

The most common problem is a shape mismatch. The C code may compile, but inference fails because the input buffer size does not match the model's expected tensor shape.

Another issue is converting a model that uses operations not supported by the selected TensorFlow Lite build or delegate configuration.

People also underestimate packaging. The code can be correct while deployment still fails because the shared libraries are missing or incompatible with the target platform.

Finally, do not assume a Python model can be dropped directly into C without conversion. The runtime format is part of the deployment process, not an afterthought.

Summary

For plain C inference, TensorFlow Lite is usually the simplest deployment path.
Convert the trained Python model into a .tflite file before writing the C program.
The core inference flow is load model, allocate tensors, copy input, invoke, and read output.
Match your buffers to the model's actual tensor shapes.
Use the full TensorFlow C API only when you truly need the full runtime instead of lightweight inference.