Image Conversion
MNIST Dataset
Image Processing
Machine Learning
Data Preprocessing

Convert own image to MNIST's image

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Converting your own digit image to something MNIST-like means more than just resizing it to 28x28. MNIST digits are centered, grayscale, size-normalized, and usually represented with a bright digit on a dark background, so the closer your preprocessing matches that distribution, the better a model trained on MNIST is likely to perform.

What MNIST Images Look Like

A typical MNIST sample has these properties:

  • one grayscale channel
  • '28x28 pixels'
  • centered handwritten digit
  • relatively tight crop around the foreground
  • consistent foreground and background polarity

If your custom image is a phone photo, a scanned page, or a white digit on a black background, those differences matter. A classifier trained on MNIST expects a very specific kind of input.

A Practical Conversion Pipeline

A good conversion workflow is:

  1. load the image in grayscale
  2. normalize foreground and background polarity
  3. threshold away background noise
  4. crop to the digit’s bounding box
  5. resize while keeping aspect ratio
  6. center the digit on a 28x28 canvas
  7. normalize pixel values for model input

Skipping the crop and centering steps is one of the biggest reasons custom images fail against MNIST-trained models.

Python Example with Pillow

python
1from PIL import Image, ImageOps
2import numpy as np
3
4
5def convert_to_mnist_style(input_path, output_path):
6    img = Image.open(input_path).convert("L")
7
8    # Invert if the source is dark ink on light paper.
9    img = ImageOps.invert(img)
10    arr = np.array(img, dtype=np.uint8)
11
12    # Remove weak background noise.
13    arr = np.where(arr > 40, arr, 0).astype(np.uint8)
14
15    ys, xs = np.where(arr > 0)
16    if len(xs) == 0 or len(ys) == 0:
17        raise ValueError("No foreground digit found")
18
19    x_min, x_max = xs.min(), xs.max()
20    y_min, y_max = ys.min(), ys.max()
21    cropped = arr[y_min:y_max + 1, x_min:x_max + 1]
22
23    digit = Image.fromarray(cropped)
24    digit.thumbnail((20, 20), Image.Resampling.LANCZOS)
25
26    canvas = Image.new("L", (28, 28), 0)
27    x_off = (28 - digit.width) // 2
28    y_off = (28 - digit.height) // 2
29    canvas.paste(digit, (x_off, y_off))
30
31    canvas.save(output_path)
32    final_arr = np.array(canvas, dtype=np.float32) / 255.0
33    return final_arr
34
35
36sample = convert_to_mnist_style("digit.png", "digit_mnist.png")
37print(sample.shape, sample.min(), sample.max())

This gives you a saved image and a normalized array suitable for model input.

Why the 20x20 Fit Is Useful

A common MNIST-style practice is to fit the digit into roughly a 20x20 region and then place it inside the full 28x28 canvas. That leaves some margin around the digit, which helps mimic the spatial style of MNIST rather than cramming the strokes against the edges.

If you resize directly to 28x28 without preserving aspect ratio or margins, thin digits can become distorted or off-center.

Model Input Shape Matters Too

The image file is only half the problem. You also need the tensor shape expected by the model.

For convolutional models, a single sample is often shaped like:

python
sample = sample.reshape(1, 28, 28, 1).astype(np.float32)

For dense models trained on flattened inputs, it might instead be:

python
sample = sample.reshape(1, 784).astype(np.float32)

A perfect image conversion can still fail if the input tensor shape is wrong.

Batch Conversion

If you are building a custom dataset, apply the same preprocessing to every image.

python
1from pathlib import Path
2
3src = Path("raw_digits")
4dst = Path("mnist_like")
5dst.mkdir(exist_ok=True)
6
7for path in src.glob("*.png"):
8    convert_to_mnist_style(str(path), str(dst / path.name))

Consistency matters more than clever per-image tweaks. A stable preprocessing pipeline produces training and inference data that match each other.

Visual Verification Is Worth It

Before trusting the converted images, inspect several outputs manually. Look for:

  • clipped strokes
  • inverted polarity
  • too much blank margin
  • noisy backgrounds
  • squashed aspect ratio

Visual checks catch many problems faster than staring at model accuracy alone.

Common Pitfalls

  • Resizing directly to 28x28 without cropping and centering the digit first.
  • Forgetting to align the image polarity with what the MNIST-trained model expects.
  • Applying a threshold that removes thin strokes along with the background noise.
  • Feeding raw uint8 values when the model expects normalized floating-point input.
  • Matching the image size correctly but using the wrong tensor shape at prediction time.

Summary

  • Converting an image to MNIST style requires cropping, centering, grayscale normalization, and size normalization, not just resizing.
  • Matching MNIST polarity and margins is important for a model trained on MNIST.
  • A 20x20 fit inside a 28x28 canvas is a practical approximation of the original dataset style.
  • The output tensor shape must match the model’s expected input format.
  • Always inspect a few converted samples visually before assuming the preprocessing is correct.

Course illustration
Course illustration

All Rights Reserved.