Crop image to bounding box in Tensorflow Object Detection API

Tensorflow

Object Detection

Bounding Box

Image Processing

API

Crop image to bounding box in Tensorflow Object Detection API

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Cropping an image to a detection result in TensorFlow usually means converting a model's bounding box into valid image coordinates. The main source of confusion is that TensorFlow Object Detection outputs are commonly normalized and ordered as ymin, xmin, ymax, xmax, which is easy to mix up with other box formats.

Convert Detection Boxes to Pixel Coordinates

If you want a crop from one image and one box, first convert the normalized coordinates into integer pixel edges. Then compute the crop height and width from those edges.

python

1import tensorflow as tf
2
3image = tf.random.uniform((480, 640, 3), maxval=255, dtype=tf.int32)
4image = tf.cast(image, tf.uint8)
5
6box = [0.25, 0.10, 0.75, 0.60]  # ymin, xmin, ymax, xmax
7
8height = tf.shape(image)[0]
9width = tf.shape(image)[1]
10
11ymin = tf.cast(box[0] * tf.cast(height, tf.float32), tf.int32)
12xmin = tf.cast(box[1] * tf.cast(width, tf.float32), tf.int32)
13ymax = tf.cast(box[2] * tf.cast(height, tf.float32), tf.int32)
14xmax = tf.cast(box[3] * tf.cast(width, tf.float32), tf.int32)
15
16cropped = tf.image.crop_to_bounding_box(
17    image=image,
18    offset_height=ymin,
19    offset_width=xmin,
20    target_height=ymax - ymin,
21    target_width=xmax - xmin,
22)
23
24print(cropped.shape)

The important detail is that tf.image.crop_to_bounding_box does not take ending coordinates. It needs a top-left offset plus a height and width.

Clamp and Validate Before Cropping

Model output is not always perfectly clean. A box can be partly outside the image, reversed by a buggy post-processing step, or so small that integer rounding removes its size. Clamp the values before cropping.

python

1import tensorflow as tf
2
3
4def crop_from_normalized_box(image, box):
5    image_height = tf.shape(image)[0]
6    image_width = tf.shape(image)[1]
7
8    ymin, xmin, ymax, xmax = box
9
10    ymin = tf.clip_by_value(ymin, 0.0, 1.0)
11    xmin = tf.clip_by_value(xmin, 0.0, 1.0)
12    ymax = tf.clip_by_value(ymax, 0.0, 1.0)
13    xmax = tf.clip_by_value(xmax, 0.0, 1.0)
14
15    top = tf.cast(ymin * tf.cast(image_height, tf.float32), tf.int32)
16    left = tf.cast(xmin * tf.cast(image_width, tf.float32), tf.int32)
17    bottom = tf.cast(ymax * tf.cast(image_height, tf.float32), tf.int32)
18    right = tf.cast(xmax * tf.cast(image_width, tf.float32), tf.int32)
19
20    crop_height = tf.maximum(bottom - top, 1)
21    crop_width = tf.maximum(right - left, 1)
22
23    return tf.image.crop_to_bounding_box(image, top, left, crop_height, crop_width)

That defensive step prevents many runtime errors and makes debugging much easier.

Use `crop_and_resize` for Batches

When you need many crops or fixed-size outputs, tf.image.crop_and_resize is usually the better tool. It accepts normalized boxes directly and produces tensors with a consistent shape.

python

1import tensorflow as tf
2
3images = tf.random.uniform((1, 480, 640, 3), dtype=tf.float32)
4boxes = tf.constant([[0.25, 0.10, 0.75, 0.60]], dtype=tf.float32)
5box_indices = tf.constant([0], dtype=tf.int32)
6
7crops = tf.image.crop_and_resize(
8    image=images,
9    boxes=boxes,
10    box_indices=box_indices,
11    crop_size=(224, 224),
12)
13
14print(crops.shape)

This is especially useful when a detector feeds another model stage that expects a fixed input size.

Separate Visualization from Model Input

Not every crop has the same goal. For debugging and annotation tools, you may want the raw pixel crop without resizing so you can inspect the object exactly as it appeared. For training pipelines, consistent output size is often more important than preserving the original crop dimensions.

That distinction usually determines whether crop_to_bounding_box or crop_and_resize is the better fit.

Common Pitfalls

The most common bug is reading the box as xmin, ymin, xmax, ymax when TensorFlow detections are usually ymin, xmin, ymax, xmax. Swapping those fields produces incorrect crops that can still look almost valid, which makes the bug easy to miss.

Another common error is passing normalized values straight into crop_to_bounding_box. That function expects integer pixel offsets and sizes, not fractions.

People also forget that the API wants height and width, not bottom-right coordinates. If you pass ymax and xmax directly, the crop dimensions will be wrong.

Finally, do not skip bounds checking. Real predictions can include tiny negative offsets or coordinates slightly above 1.0, especially after custom transformations.

Summary

TensorFlow Object Detection boxes are commonly normalized and ordered as ymin, xmin, ymax, xmax.
Convert normalized coordinates to pixel coordinates before using crop_to_bounding_box.
Compute crop height and width from the box edges explicitly.
Use crop_and_resize when you need batched crops or fixed output dimensions.
Most cropping errors come from coordinate-format mistakes, not from TensorFlow itself.

Crop image to bounding box in Tensorflow Object Detection API

Master System Design with Codemia

Introduction

Convert Detection Boxes to Pixel Coordinates

Clamp and Validate Before Cropping

Use crop_and_resize for Batches

Separate Visualization from Model Input

Common Pitfalls

Summary

Use `crop_and_resize` for Batches