Tensorflow
PyTorch
Conv2D
padding
machine learning

Comparing Conv2D with padding between Tensorflow and PyTorch

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

TensorFlow and PyTorch both provide 2D convolution layers, but padding details are one of the places where equivalent-looking code can produce different shapes or even fail outright. To compare them correctly, you need to look at padding mode, tensor layout, and the framework rules for stride and explicit padding.

The Shared Concepts

A 2D convolution slides a kernel across height and width. Padding controls what happens at the borders.

The two most common modes are:

  • 'valid, meaning no padding'
  • 'same, meaning pad enough to preserve spatial size when the stride permits it'

Both frameworks support these ideas, but their APIs are not identical.

Tensor Layout Is the First Difference

TensorFlow usually works with NHWC, which means batch, height, width, channels. PyTorch usually works with NCHW, which means batch, channels, height, width.

That difference matters because two layers can both say “same padding” and still receive tensors in incompatible layouts.

python
1import tensorflow as tf
2import torch
3
4x_tf = tf.random.normal((1, 28, 28, 3))
5x_torch = torch.randn(1, 3, 28, 28)
6
7print(x_tf.shape)
8print(x_torch.shape)

If you port a model between frameworks, shape mismatches are often caused by layout before padding even enters the discussion.

same Padding in TensorFlow

TensorFlow’s tf.keras.layers.Conv2D accepts padding="same" or padding="valid". With padding="same" and strides=1, the output height and width match the input size.

python
1import tensorflow as tf
2
3x = tf.random.normal((1, 28, 28, 3))
4layer = tf.keras.layers.Conv2D(filters=8, kernel_size=3, padding="same")
5y = layer(x)
6print(y.shape)

TensorFlow also supports lower-level explicit padding through tf.nn.conv2d by passing a padding list instead of only "SAME" or "VALID".

python
1import tensorflow as tf
2
3x = tf.random.normal((1, 5, 5, 1))
4kernel = tf.random.normal((3, 3, 1, 1))
5
6y = tf.nn.conv2d(
7    x,
8    kernel,
9    strides=[1, 1, 1, 1],
10    padding=[[0, 0], [1, 1], [2, 2], [0, 0]],
11)
12print(y.shape)

That level of control is useful when you need asymmetric padding.

same Padding in PyTorch

PyTorch nn.Conv2d accepts padding=0, an integer, a tuple, or string modes such as padding="same" and padding="valid". But there is an important restriction: current PyTorch docs state that padding="same" does not support stride values other than 1.

python
1import torch
2import torch.nn as nn
3
4x = torch.randn(1, 3, 28, 28)
5layer = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding="same")
6y = layer(x)
7print(y.shape)

For manual symmetric padding, PyTorch usually uses integers or tuples:

python
1import torch
2import torch.nn as nn
3
4x = torch.randn(1, 3, 28, 28)
5layer = nn.Conv2d(3, 8, kernel_size=3, stride=2, padding=1)
6y = layer(x)
7print(y.shape)

If you need more control, you can pad first and convolve second.

python
1import torch
2import torch.nn.functional as F
3import torch.nn as nn
4
5x = torch.randn(1, 3, 5, 5)
6x = F.pad(x, (2, 2, 1, 1))
7layer = nn.Conv2d(3, 4, kernel_size=3, padding=0)
8y = layer(x)
9print(y.shape)

The tuple order in F.pad is left, right, top, bottom, which is another detail that trips people up.

Why Equivalent Code Can Still Differ

Even if both models use a 3x3 kernel, the frameworks may disagree because of:

  • different tensor layouts
  • different meaning of manual padding values
  • stride interaction with same padding
  • explicit versus implicit padding choices
  • different low-level rounding behavior when output shapes are derived

The safest comparison is to compute expected output shapes directly and inspect the actual tensor dimensions after a forward pass.

A Practical Porting Strategy

When porting a convolution from TensorFlow to PyTorch or the reverse:

  1. match the tensor layout first
  2. match kernel size, stride, dilation, and groups
  3. replace same with explicit padding if the stride rules do not line up
  4. test on one known input shape and compare outputs step by step

This is more reliable than trying to translate the layer declaration by eye.

Common Pitfalls

The biggest pitfall is forgetting that TensorFlow usually expects NHWC while PyTorch usually expects NCHW. Padding can look wrong when the real bug is channel order.

Another mistake is assuming padding="same" behaves identically in both frameworks for every stride. TensorFlow supports same more broadly, while current PyTorch documentation limits same to stride 1 for Conv2d.

Developers also mix up explicit padding conventions. A single integer in PyTorch means symmetric padding on height and width, while TensorFlow’s low-level explicit padding can specify each dimension separately.

Finally, do not compare only layer declarations. Always compare actual output shapes and, when necessary, the numerical outputs on the same test tensor.

Summary

  • TensorFlow and PyTorch support the same high-level padding ideas, but the APIs differ in important details.
  • Tensor layout is often the first source of mismatch: TensorFlow usually uses NHWC, PyTorch usually uses NCHW.
  • TensorFlow allows same padding directly and also supports explicit padding lists in lower-level ops.
  • PyTorch supports padding="same", but current docs note that it does not support stride values other than 1 for Conv2d.
  • When porting models, verify layout and output shapes before assuming the padding rule is equivalent.

Course illustration
Course illustration

All Rights Reserved.