Converting JPG images for input to scikit learn SVM classifier

image-preprocessing

scikit-learn

SVM

machine-learning

JPG-conversion

Converting JPG images for input to scikit learn SVM classifier

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Support Vector Machine classifiers in scikit-learn expect input data as flat numerical arrays, but JPG images are stored as multi-dimensional pixel grids. Bridging this gap requires a preprocessing pipeline that loads, resizes, and transforms each image into a consistent one-dimensional feature vector. This article walks through every step of that conversion, from raw pixel flattening to more advanced feature extraction with Histogram of Oriented Gradients (HOG).

Loading Images with PIL and OpenCV

The first step is reading JPG files into memory as numerical arrays. Both PIL (Pillow) and OpenCV are common choices.

python

1from PIL import Image
2import numpy as np
3
4# Loading with PIL
5img_pil = Image.open("photo.jpg")
6pixel_array = np.array(img_pil)
7print(pixel_array.shape)  # e.g. (480, 640, 3) for a color image

python

1import cv2
2
3# Loading with OpenCV
4img_cv = cv2.imread("photo.jpg")
5print(img_cv.shape)  # e.g. (480, 640, 3) in BGR order

OpenCV loads images in BGR channel order rather than RGB. If you need RGB ordering after loading with OpenCV, convert with cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB).

Resizing to a Uniform Dimension

SVM classifiers require every input vector to have the same length. Because JPG images come in varying resolutions, you must resize them to a common width and height before flattening.

python

1from PIL import Image
2
3target_size = (64, 64)
4
5img = Image.open("photo.jpg")
6img_resized = img.resize(target_size)
7print(np.array(img_resized).shape)  # (64, 64, 3)

With OpenCV the equivalent call is cv2.resize(img_cv, target_size). Choose a target size that balances detail preservation against feature vector length. A 64x64 color image produces a vector of 12,288 values (64 * 64 * 3), which is already large for an SVM.

Converting to Grayscale

Reducing color images to a single channel cuts the feature count by two-thirds and often improves SVM performance on tasks where color is irrelevant, such as handwritten digit recognition.

python

1from PIL import Image
2
3img = Image.open("photo.jpg").convert("L")  # grayscale
4gray_array = np.array(img.resize((64, 64)))
5print(gray_array.shape)  # (64, 64)

python

1import cv2
2
3img_gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)
4img_gray = cv2.resize(img_gray, (64, 64))

Flattening to a 1D Feature Vector

Scikit-learn expects each sample as a one-dimensional array. Reshape the 2D (grayscale) or 3D (color) pixel grid into a flat vector.

python

feature_vector = gray_array.flatten()
print(feature_vector.shape)  # (4096,) for a 64x64 grayscale image

When building a dataset of many images, stack the vectors into a matrix where each row is one sample.

python

1import os
2import numpy as np
3from PIL import Image
4
5def load_images(folder, target_size=(64, 64)):
6    vectors = []
7    for filename in os.listdir(folder):
8        if filename.lower().endswith(".jpg"):
9            img = Image.open(os.path.join(folder, filename)).convert("L")
10            img = img.resize(target_size)
11            vectors.append(np.array(img).flatten())
12    return np.array(vectors)
13
14X = load_images("dataset/cats")
15print(X.shape)  # (num_images, 4096)

Normalizing Pixel Values

Raw pixel values range from 0 to 255. Normalizing them to a 0-1 or -1 to 1 range helps the SVM optimizer converge faster, especially with RBF kernels.

python

1from sklearn.preprocessing import MinMaxScaler
2
3scaler = MinMaxScaler()
4X_scaled = scaler.fit_transform(X.astype(np.float64))

Alternatively, divide by 255 directly for a simple 0-1 scaling.

python

X_normalized = X.astype(np.float64) / 255.0

Extracting HOG Features

Feeding raw pixels into an SVM often yields poor accuracy because pixel values do not capture shape and edge information well. Histogram of Oriented Gradients (HOG) extracts local gradient direction distributions, producing a compact and discriminative descriptor.

python

1from skimage.feature import hog
2from PIL import Image
3import numpy as np
4
5img = Image.open("photo.jpg").convert("L").resize((64, 64))
6pixel_array = np.array(img)
7
8features = hog(
9    pixel_array,
10    orientations=9,
11    pixels_per_cell=(8, 8),
12    cells_per_block=(2, 2),
13    block_norm="L2-Hys",
14)
15print(features.shape)  # (1764,) - much smaller than 4096 raw pixels

HOG features dramatically reduce dimensionality while improving classification accuracy for many image tasks.

Complete Pipeline with SVM Training

Bringing all the steps together into a training pipeline shows how each piece fits.

python

1import os
2import numpy as np
3from PIL import Image
4from skimage.feature import hog
5from sklearn.svm import SVC
6from sklearn.model_selection import train_test_split
7from sklearn.metrics import accuracy_score
8
9def extract_hog_features(image_path, target_size=(64, 64)):
10    img = Image.open(image_path).convert("L").resize(target_size)
11    pixel_array = np.array(img)
12    features = hog(
13        pixel_array,
14        orientations=9,
15        pixels_per_cell=(8, 8),
16        cells_per_block=(2, 2),
17        block_norm="L2-Hys",
18    )
19    return features
20
21def build_dataset(class_folders):
22    X, y = [], []
23    for label, folder in enumerate(class_folders):
24        for fname in os.listdir(folder):
25            if fname.lower().endswith(".jpg"):
26                feat = extract_hog_features(os.path.join(folder, fname))
27                X.append(feat)
28                y.append(label)
29    return np.array(X), np.array(y)
30
31X, y = build_dataset(["dataset/cats", "dataset/dogs"])
32X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
33
34clf = SVC(kernel="rbf", C=10, gamma="scale")
35clf.fit(X_train, y_train)
36
37predictions = clf.predict(X_test)
38print(f"Accuracy: {accuracy_score(y_test, predictions):.2f}")

Common Pitfalls

Inconsistent image sizes - Forgetting to resize all images to the same dimensions produces arrays of different lengths, which scikit-learn rejects with a shape mismatch error.
Skipping normalization - Raw 0-255 pixel values cause the SVM kernel to weight large-magnitude features disproportionately, degrading accuracy and slowing convergence.
Using color images unnecessarily - Three-channel images triple the feature count without always improving results, leading to longer training and potential overfitting on small datasets.
Ignoring channel order with OpenCV - OpenCV loads images as BGR while most libraries expect RGB. Mixing channel orders silently corrupts the data without raising errors.
Relying on raw pixels instead of engineered features - Flat pixel vectors lack spatial structure information. Feature descriptors like HOG almost always outperform raw pixels for SVM classifiers.

Summary

Load JPG images with PIL or OpenCV and convert them into NumPy arrays for numerical processing.
Resize every image to the same target dimensions so all feature vectors have identical length.
Convert to grayscale when color is not relevant to reduce feature dimensionality by a factor of three.
Flatten the 2D pixel grid into a 1D vector using flatten() before passing data to scikit-learn.
Normalize pixel values to a 0-1 range to help the SVM optimizer converge efficiently.
Use HOG or similar feature descriptors instead of raw pixels for better classification accuracy.
Combine all steps into a reusable pipeline function to keep preprocessing consistent between training and inference.