Saving the objects detected in a dataframe tensorflow object_detection

tensorflow

object detection

dataframe

machine learning

data processing

Saving the objects detected in a dataframe tensorflow object_detection

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Object detection pipelines are more useful when predictions can be analyzed outside the model runtime. A pandas DataFrame gives you a clean structure for filtering, joining, and exporting detections. This guide shows how to turn TensorFlow detection outputs into tabular records that are easy to audit and reuse.

Understanding Detection Tensors

Most TensorFlow object detection models return a dictionary with arrays for boxes, scores, classes, and count. Boxes are usually normalized coordinates in the order ymin, xmin, ymax, xmax, where values are fractions of image height or width.

To save results in a DataFrame, normalize your assumptions first:

Record image metadata such as filename and shape.
Convert class identifiers to human-readable labels.
Convert normalized box coordinates to pixel units when needed.
Keep confidence score as a floating point value.

A clean schema prevents confusion later when you compare model runs across datasets.

Building a DataFrame from Model Output

The following example simulates a model output and converts detections above a threshold into a DataFrame.

python

1import pandas as pd
2
3label_map = {
4    1: "person",
5    2: "bicycle",
6    3: "car",
7}
8
9# Example output from a detection model for one image
10outputs = {
11    "detection_boxes": [
12        [0.10, 0.20, 0.50, 0.70],
13        [0.55, 0.15, 0.90, 0.45],
14        [0.05, 0.05, 0.12, 0.15],
15    ],
16    "detection_scores": [0.93, 0.81, 0.24],
17    "detection_classes": [1, 3, 2],
18}
19
20image_name = "street_001.jpg"
21img_h, img_w = 720, 1280
22score_threshold = 0.50
23
24rows = []
25for box, score, cls_id in zip(
26    outputs["detection_boxes"],
27    outputs["detection_scores"],
28    outputs["detection_classes"],
29):
30    if score < score_threshold:
31        continue
32
33    ymin, xmin, ymax, xmax = box
34    rows.append(
35        {
36            "image": image_name,
37            "class_id": int(cls_id),
38            "class_name": label_map.get(int(cls_id), "unknown"),
39            "score": float(score),
40            "xmin": int(xmin * img_w),
41            "ymin": int(ymin * img_h),
42            "xmax": int(xmax * img_w),
43            "ymax": int(ymax * img_h),
44            "width": int((xmax - xmin) * img_w),
45            "height": int((ymax - ymin) * img_h),
46        }
47    )
48
49df = pd.DataFrame(rows)
50print(df)

This DataFrame can be appended across images and stored as CSV or Parquet.

Scaling to Multiple Images

For production workloads, wrap the conversion in a function and call it per image. Track run-level metadata such as model version, threshold, and inference timestamp. That extra context is critical when debugging quality changes.

python

1from datetime import datetime
2
3all_rows = []
4model_version = "ssd-mobilenet-v2-2026-01"
5
6for image_name, outputs, shape in dataset_predictions:
7    img_h, img_w = shape
8    for box, score, cls_id in zip(
9        outputs["detection_boxes"],
10        outputs["detection_scores"],
11        outputs["detection_classes"],
12    ):
13        if score < 0.50:
14            continue
15        ymin, xmin, ymax, xmax = box
16        all_rows.append(
17            {
18                "image": image_name,
19                "model_version": model_version,
20                "predicted_at": datetime.utcnow().isoformat(timespec="seconds"),
21                "class_id": int(cls_id),
22                "score": float(score),
23                "xmin": int(xmin * img_w),
24                "ymin": int(ymin * img_h),
25                "xmax": int(xmax * img_w),
26                "ymax": int(ymax * img_h),
27            }
28        )
29
30detections_df = pd.DataFrame(all_rows)
31detections_df.to_parquet("detections.parquet", index=False)

Parquet is usually faster and smaller than CSV for large runs. Use CSV for quick manual inspection and Parquet for long-term pipelines.

Common Pitfalls

A frequent mistake is forgetting that boxes are normalized. If you treat them as pixels, bounding boxes look wrong and downstream metrics fail. Convert coordinates using the image width and height from the same frame.

Another issue is class mapping drift. If your label map does not match the trained checkpoint, class names become misleading. Keep label files versioned next to the model artifact.

Teams also lose data by applying very high thresholds too early. Keep raw scores in storage so you can reevaluate cutoffs without rerunning expensive inference jobs.

Finally, avoid mixed data types in one column. For example, storing numeric class identifiers as strings in some rows makes filtering slower and error prone.

Summary

Convert detection tensors into a stable tabular schema with clear column names.
Store both normalized context and pixel coordinates when possible.
Include metadata such as model version and inference time.
Use Parquet for scale and CSV for quick checks.
Keep thresholds and label maps explicit to avoid silent analysis errors.