How to create mask images from COCO dataset?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Creating mask images from the COCO dataset means converting the segmentation annotations in the COCO JSON files into per-pixel images. Those masks are commonly used for semantic segmentation, instance segmentation, and dataset visualization.
The easiest path is to use pycocotools, because COCO annotations may be polygons or run-length encoded masks, and the library already knows how to decode both forms correctly.
What the COCO Annotations Contain
COCO stores image metadata and object annotations separately. For segmentation work, the most important fields are:
- '
image_idto connect an annotation to an image' - '
category_idto identify the object class' - '
segmentationcontaining polygons or RLE data' - '
iscrowdindicating crowd-style annotations'
A mask image is not stored directly in the JSON. You derive it from these annotations.
Install the Required Tools
A minimal setup uses:
- '
pycocotoolsfor reading COCO annotations and decoding masks' - '
Pilloworimageiofor saving PNG files' - optionally
numpyfor combining masks
Create a Binary Mask for One Object
Here is a small example that loads an annotation and converts it to a binary mask.
This produces a black-and-white mask for one object instance.
Build a Full Semantic Mask
For semantic segmentation, you usually want one mask image where each pixel stores a class id. In that case, combine all annotations for the image.
Each pixel in the output stores the category id of the annotation that covered it.
Semantic Versus Instance Masks
Be clear about which mask type you need.
- binary mask: one object versus background
- semantic mask: one class label per pixel
- instance mask: different objects of the same class stay separate
If two people appear in one image, a semantic mask may label both as the same class value. An instance mask would keep them as separate object instances.
Handling Overlap and Order
Multiple annotations can overlap. In the simple loop above, later annotations overwrite earlier ones.
That may be acceptable for visualization, but for training data you should define a rule explicitly, for example:
- preserve the first object written
- let the largest object win
- prefer non-crowd annotations over crowd annotations
The correct policy depends on the training task.
Save Colorized Masks for Inspection
For debugging, grayscale category ids are hard to read. A colorized preview helps confirm that the annotations were converted correctly.
You can map category ids to colors with a lookup table and save a visualization image separately from the raw label mask.
That way you keep a machine-readable training mask and a human-readable QA image.
Common Pitfalls
A common mistake is trying to decode the segmentation field manually instead of using pycocotools. COCO annotations are more varied than they first appear.
Another mistake is confusing semantic segmentation with instance segmentation. The right output format depends on the training objective.
Developers also forget to use the image dimensions from COCO metadata, which can cause incorrectly shaped masks.
Finally, if annotations overlap, decide your overwrite rule intentionally instead of accepting whatever loop order happens to do.
Summary
- COCO mask images are derived from segmentation annotations, not stored directly as image files.
- '
pycocotoolsis the standard way to decode polygons and RLE masks correctly.' - Use binary masks for single objects, semantic masks for per-class labels, and instance masks when object identity matters.
- Always use the original image width and height when building the mask array.
- Define an explicit policy for overlapping annotations before generating training data at scale.

