contour detection
image processing
computer vision
letter analysis
digital image analysis

Finding contours of a two-part letter

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When a letter has two disconnected visible parts, the main challenge is not contour detection itself but deciding which contours belong to the same character. A generic findContours call will happily return every blob in the image, so the real work is preprocessing the image and grouping the right connected components.

Start with a clean binary image

Contour detection depends heavily on thresholding quality. The usual preprocessing steps are:

  • convert to grayscale
  • threshold so the letter becomes foreground
  • optionally remove small noise with morphology
python
1import cv2
2
3image = cv2.imread("letter.png")
4gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
5
6_, binary = cv2.threshold(
7    gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU
8)

Using inverse thresholding is common when the letter is dark on a light background, because it turns the letter into white foreground pixels on black background.

Disconnected letters need component logic

If the letter is something like i or j, the dot and the stem are separate components. cv2.findContours will therefore return two external contours, not one.

A practical first pass is:

python
1contours, _ = cv2.findContours(
2    binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
3)
4
5for contour in contours:
6    area = cv2.contourArea(contour)
7    if area > 20:
8        x, y, w, h = cv2.boundingRect(contour)
9        cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

This finds separate outer contours and filters tiny noise blobs by area.

Decide whether the parts belong to one letter

Finding contours is only step one. To identify the two parts of the same letter, use geometric rules such as:

  • horizontal alignment
  • expected vertical spacing
  • similar x-position
  • relative size constraints

For example, the dot of a lowercase i is usually above the stem and roughly centered over it. That is a much better rule than assuming the two largest contours always belong together.

If your image contains only one character, the problem is easy: keep the two relevant contours after filtering noise. If the image contains multiple characters, you need grouping logic based on position and shape.

Use contour hierarchy when holes matter

Some letters are not disconnected but contain holes, such as B, D, O, or R. In those cases, RETR_EXTERNAL is not enough because it only returns the outer contour. If you need inner holes too, use hierarchical retrieval:

python
contours, hierarchy = cv2.findContours(
    binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
)

This is a different situation from a truly two-part disconnected letter. A hole is a child contour inside one connected component, not a second standalone component.

That distinction is important because it changes the post-processing rules completely.

Connected components can be simpler than contours

If the real question is "how many disconnected pieces does this letter have," connected-component labeling is often simpler than contour analysis:

python
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary)
print(num_labels)

This gives you bounding boxes and areas for each connected region directly. For disconnected letters, components are often the cleanest abstraction. Contours are more useful when boundary shape details matter.

Common Pitfalls

The biggest mistake is using one threshold for every image without checking contrast and background quality. Poor binarization creates broken contours or merges separate regions incorrectly.

Another mistake is confusing disconnected components with holes. A lowercase i has two external components, while B usually has one external contour with internal holes.

Developers also filter by area too aggressively and accidentally remove the smaller part of the letter, such as the dot of i or j.

Finally, do not assume the two largest contours belong together when the image contains multiple characters or noise. Grouping needs geometric rules, not just size ranking.

Summary

  • For two-part letters, the hard problem is usually grouping the right components, not calling findContours.
  • Start with careful thresholding and noise removal.
  • Use RETR_EXTERNAL for disconnected pieces and RETR_TREE when hole hierarchy matters.
  • Connected-component labeling is often simpler when you care about separate blobs rather than precise boundaries.
  • Filter and group contours with geometry rules instead of assuming the largest blobs belong together.

Course illustration
Course illustration

All Rights Reserved.