COCO dataset
mask images
image segmentation
computer vision
data annotation

How to create mask images from COCO dataset?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Creating mask images from the COCO dataset means converting the segmentation annotations in the COCO JSON files into per-pixel images. Those masks are commonly used for semantic segmentation, instance segmentation, and dataset visualization.

The easiest path is to use pycocotools, because COCO annotations may be polygons or run-length encoded masks, and the library already knows how to decode both forms correctly.

What the COCO Annotations Contain

COCO stores image metadata and object annotations separately. For segmentation work, the most important fields are:

  • 'image_id to connect an annotation to an image'
  • 'category_id to identify the object class'
  • 'segmentation containing polygons or RLE data'
  • 'iscrowd indicating crowd-style annotations'

A mask image is not stored directly in the JSON. You derive it from these annotations.

Install the Required Tools

A minimal setup uses:

  • 'pycocotools for reading COCO annotations and decoding masks'
  • 'Pillow or imageio for saving PNG files'
  • optionally numpy for combining masks
bash
python3 -m pip install pycocotools pillow numpy

Create a Binary Mask for One Object

Here is a small example that loads an annotation and converts it to a binary mask.

python
1from pycocotools.coco import COCO
2from PIL import Image
3import numpy as np
4
5coco = COCO("instances_train2017.json")
6image_id = coco.getImgIds()[0]
7ann_ids = coco.getAnnIds(imgIds=image_id)
8anns = coco.loadAnns(ann_ids)
9img_info = coco.loadImgs(image_id)[0]
10
11height = img_info["height"]
12width = img_info["width"]
13
14mask = coco.annToMask(anns[0]) * 255
15Image.fromarray(mask.astype(np.uint8)).save("single_mask.png")

This produces a black-and-white mask for one object instance.

Build a Full Semantic Mask

For semantic segmentation, you usually want one mask image where each pixel stores a class id. In that case, combine all annotations for the image.

python
1from pycocotools.coco import COCO
2from PIL import Image
3import numpy as np
4
5coco = COCO("instances_train2017.json")
6image_id = coco.getImgIds()[0]
7ann_ids = coco.getAnnIds(imgIds=image_id)
8anns = coco.loadAnns(ann_ids)
9img_info = coco.loadImgs(image_id)[0]
10
11height = img_info["height"]
12width = img_info["width"]
13semantic_mask = np.zeros((height, width), dtype=np.uint8)
14
15for ann in anns:
16    category_id = ann["category_id"]
17    instance_mask = coco.annToMask(ann)
18    semantic_mask[instance_mask == 1] = category_id
19
20Image.fromarray(semantic_mask).save("semantic_mask.png")

Each pixel in the output stores the category id of the annotation that covered it.

Semantic Versus Instance Masks

Be clear about which mask type you need.

  • binary mask: one object versus background
  • semantic mask: one class label per pixel
  • instance mask: different objects of the same class stay separate

If two people appear in one image, a semantic mask may label both as the same class value. An instance mask would keep them as separate object instances.

Handling Overlap and Order

Multiple annotations can overlap. In the simple loop above, later annotations overwrite earlier ones.

That may be acceptable for visualization, but for training data you should define a rule explicitly, for example:

  • preserve the first object written
  • let the largest object win
  • prefer non-crowd annotations over crowd annotations

The correct policy depends on the training task.

Save Colorized Masks for Inspection

For debugging, grayscale category ids are hard to read. A colorized preview helps confirm that the annotations were converted correctly.

You can map category ids to colors with a lookup table and save a visualization image separately from the raw label mask.

That way you keep a machine-readable training mask and a human-readable QA image.

Common Pitfalls

A common mistake is trying to decode the segmentation field manually instead of using pycocotools. COCO annotations are more varied than they first appear.

Another mistake is confusing semantic segmentation with instance segmentation. The right output format depends on the training objective.

Developers also forget to use the image dimensions from COCO metadata, which can cause incorrectly shaped masks.

Finally, if annotations overlap, decide your overwrite rule intentionally instead of accepting whatever loop order happens to do.

Summary

  • COCO mask images are derived from segmentation annotations, not stored directly as image files.
  • 'pycocotools is the standard way to decode polygons and RLE masks correctly.'
  • Use binary masks for single objects, semantic masks for per-class labels, and instance masks when object identity matters.
  • Always use the original image width and height when building the mask array.
  • Define an explicit policy for overlapping annotations before generating training data at scale.

Course illustration
Course illustration

All Rights Reserved.