How to invoke the Flex delegate for tflite interpreters?

TensorFlow Lite

Flex Delegate

Neural Networks

Machine Learning

Model Optimization

How to invoke the Flex delegate for tflite interpreters?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

The TFLite Flex delegate is used when a TensorFlow Lite model contains Select TensorFlow Ops that the standard TFLite kernels do not implement by themselves. In other words, you do not turn on the Flex delegate because it is faster. You use it because the model requires TensorFlow ops support that plain TFLite cannot execute.

When the Flex Delegate Is Needed

A model needs Flex support when the conversion step preserves unsupported TensorFlow operations as Select TensorFlow Ops instead of lowering everything to native TFLite operators.

That usually happens when you convert like this:

python

1import tensorflow as tf
2
3converter = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir")
4converter.target_spec.supported_ops = [
5    tf.lite.OpsSet.TFLITE_BUILTINS,
6    tf.lite.OpsSet.SELECT_TF_OPS,
7]
8
9tflite_model = converter.convert()
10
11with open("model.tflite", "wb") as f:
12    f.write(tflite_model)

The key detail is SELECT_TF_OPS. Without that, conversion may fail. With it, conversion can succeed, but the runtime now needs Flex support to execute those preserved TensorFlow ops.

Understand the Runtime Side

The delegate question is really a runtime-packaging question. The interpreter must be built or distributed with Flex support available.

There are two practical cases:

the environment already bundles Flex support and creates the delegate automatically
you load the Flex delegate explicitly from the shared library

If the runtime does not include Flex support, the interpreter will fail at execution time even if the .tflite file was produced successfully.

Explicit Loading in Python

If you have a shared Flex delegate library available, you can load it explicitly with load_delegate.

python

1import tensorflow as tf
2
3flex_delegate = tf.lite.experimental.load_delegate("libtensorflowlite_flex_delegate.so")
4interpreter = tf.lite.Interpreter(
5    model_path="model.tflite",
6    experimental_delegates=[flex_delegate],
7)
8interpreter.allocate_tensors()
9
10print(interpreter.get_input_details())
11print(interpreter.get_output_details())

On macOS or Windows, the shared library filename may differ from the Linux-style .so example. The important point is that the delegate library must exist and match the runtime platform.

Automatic Delegate Creation

In some environments, especially when using the full TensorFlow Python package, the interpreter may log that it created a TensorFlow Lite delegate for Select TF Ops automatically. In that case, explicit loading is not always necessary.

A minimal test looks like this:

python

1import tensorflow as tf
2
3interpreter = tf.lite.Interpreter(model_path="model.tflite")
4interpreter.allocate_tensors()

If the environment supports Flex and the model needs it, the runtime may wire it up during interpreter creation. If it does not, you will typically see an operator-resolution failure.

This is why the safest mental model is: Select TF Ops at conversion time must be matched by Flex-capable runtime support at inference time.

Verify That the Model Really Needs Flex

Do not add Flex blindly. It increases runtime size and can reduce the portability advantages that make TFLite attractive in the first place.

A good workflow is:

try conversion with built-in TFLite ops only
if conversion fails because of unsupported ops, use SELECT_TF_OPS
confirm at inference time whether the model requires Flex support
only then package the delegate or matching runtime

If you can rewrite the model to avoid unsupported ops, that is often the better deployment choice.

Example Inference Run

Once the interpreter is created, inference is the same general flow as any other TFLite model.

python

1import numpy as np
2import tensorflow as tf
3
4flex_delegate = tf.lite.experimental.load_delegate("libtensorflowlite_flex_delegate.so")
5interpreter = tf.lite.Interpreter(
6    model_path="model.tflite",
7    experimental_delegates=[flex_delegate],
8)
9interpreter.allocate_tensors()
10
11input_details = interpreter.get_input_details()
12output_details = interpreter.get_output_details()
13
14sample = np.zeros(input_details[0]["shape"], dtype=input_details[0]["dtype"])
15interpreter.set_tensor(input_details[0]["index"], sample)
16interpreter.invoke()
17output = interpreter.get_tensor(output_details[0]["index"])
18print(output)

If this works, the runtime and delegate packaging are aligned.

Common Pitfalls

The most common mistake is assuming Flex is a conversion flag only, when it is really a conversion-and-runtime pairing. Another is loading a model converted with SELECT_TF_OPS into a runtime that does not ship with Flex support. Developers also sometimes add Flex without first checking whether the model could be converted using built-in TFLite operators only, which creates unnecessary runtime size and complexity. A final issue is assuming the delegate must always be loaded explicitly even in environments that create it automatically once the proper runtime support is present.

Summary

The Flex delegate is needed for TFLite models that contain Select TensorFlow Ops.
Use SELECT_TF_OPS during conversion only when built-in TFLite ops are insufficient.
At inference time, the interpreter must have Flex-capable runtime support.
You can explicitly load the delegate with tf.lite.experimental.load_delegate when needed.
Prefer built-in TFLite ops when possible and use Flex only when the model truly requires it.