Swift
Machine Learning
CVPixelBuffer
Image Processing
iOS Development

Convert Image to CVPixelBuffer for Machine Learning Swift

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Most Core ML image models do not want a UIImage directly. They want a CVPixelBuffer with a specific size and pixel format, so converting the image correctly is part of getting reliable predictions, not just plumbing.

What the Model Actually Expects

Before converting anything, check the model input description in Xcode or in code. The important details are:

  • width and height
  • color format
  • whether the model expects an image or a multi-array

If the model expects 224 x 224 RGB-style image input and you feed a differently sized buffer, the prediction may fail or the framework may add resizing that you did not intend.

A Practical UIImage to CVPixelBuffer Conversion

The common pattern is:

  • create a pixel buffer
  • lock its memory
  • draw the image into that memory with Core Graphics
  • unlock the buffer
swift
1import UIKit
2import CoreVideo
3
4extension UIImage {
5    func toPixelBuffer(width: Int, height: Int) -> CVPixelBuffer? {
6        guard let cgImage = self.cgImage else { return nil }
7
8        let attrs: [CFString: Any] = [
9            kCVPixelBufferCGImageCompatibilityKey: true,
10            kCVPixelBufferCGBitmapContextCompatibilityKey: true
11        ]
12
13        var pixelBuffer: CVPixelBuffer?
14        let status = CVPixelBufferCreate(
15            kCFAllocatorDefault,
16            width,
17            height,
18            kCVPixelFormatType_32ARGB,
19            attrs as CFDictionary,
20            &pixelBuffer
21        )
22
23        guard status == kCVReturnSuccess, let buffer = pixelBuffer else {
24            return nil
25        }
26
27        CVPixelBufferLockBaseAddress(buffer, [])
28        defer { CVPixelBufferUnlockBaseAddress(buffer, []) }
29
30        guard let baseAddress = CVPixelBufferGetBaseAddress(buffer) else {
31            return nil
32        }
33
34        let colorSpace = CGColorSpaceCreateDeviceRGB()
35        guard let context = CGContext(
36            data: baseAddress,
37            width: width,
38            height: height,
39            bitsPerComponent: 8,
40            bytesPerRow: CVPixelBufferGetBytesPerRow(buffer),
41            space: colorSpace,
42            bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
43        ) else {
44            return nil
45        }
46
47        context.clear(CGRect(x: 0, y: 0, width: width, height: height))
48        context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
49
50        return buffer
51    }
52}

This is the standard shape of the solution: create, lock, draw, unlock.

Using the Buffer with Core ML

Once you have the pixel buffer, feed it into the model's prediction call:

swift
1guard let image = UIImage(named: "cat"),
2      let buffer = image.toPixelBuffer(width: 224, height: 224) else {
3    return
4}
5
6let model = try MyImageClassifier(configuration: MLModelConfiguration())
7let prediction = try model.prediction(image: buffer)
8print(prediction.classLabel)

The details of the generated API vary by model, but the conversion step stays about the same.

Orientation and Resizing Matter

A frequent bug is that the image looks correct on screen but arrives rotated or mirrored in the pixel buffer. UIImage can carry orientation metadata, while CGImage is just raw pixels. If the source image came from the camera or photo library, normalize orientation before converting, or make sure your drawing step handles it.

Resizing is another hidden decision. The simple context.draw call above scales the whole image to the target rectangle. That may distort aspect ratio. Some models tolerate that, but others perform better if you crop or letterbox first.

Performance Considerations

For occasional predictions, the conversion above is fine. For real-time camera inference, repeated UIImage conversion can become a bottleneck. In that case:

  • prefer using the camera's existing CVPixelBuffer
  • reuse buffers when possible
  • avoid unnecessary image-format hops

If your input already comes from AVCaptureVideoDataOutput, you often receive a CVPixelBuffer directly and should skip UIImage entirely.

Common Pitfalls

  • Converting to the wrong width, height, or pixel format for the model.
  • Ignoring image orientation and getting rotated predictions.
  • Stretching images without thinking about aspect ratio.
  • Converting camera frames through UIImage when a CVPixelBuffer already exists.
  • Forgetting to lock and unlock the pixel buffer around memory access.

Summary

  • Core ML image models commonly expect CVPixelBuffer, not UIImage.
  • Correct conversion requires the right size, pixel format, and drawing step.
  • Orientation and aspect-ratio handling can affect model accuracy.
  • For camera pipelines, avoid unnecessary UIImage conversions when you already have pixel buffers.
  • Treat the conversion as part of model input correctness, not just a format workaround.

Course illustration
Course illustration

All Rights Reserved.