Comparing Conv2D with padding between Tensorflow and PyTorch
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
TensorFlow and PyTorch both provide 2D convolution layers, but padding details are one of the places where equivalent-looking code can produce different shapes or even fail outright. To compare them correctly, you need to look at padding mode, tensor layout, and the framework rules for stride and explicit padding.
The Shared Concepts
A 2D convolution slides a kernel across height and width. Padding controls what happens at the borders.
The two most common modes are:
- '
valid, meaning no padding' - '
same, meaning pad enough to preserve spatial size when the stride permits it'
Both frameworks support these ideas, but their APIs are not identical.
Tensor Layout Is the First Difference
TensorFlow usually works with NHWC, which means batch, height, width, channels. PyTorch usually works with NCHW, which means batch, channels, height, width.
That difference matters because two layers can both say “same padding” and still receive tensors in incompatible layouts.
If you port a model between frameworks, shape mismatches are often caused by layout before padding even enters the discussion.
same Padding in TensorFlow
TensorFlow’s tf.keras.layers.Conv2D accepts padding="same" or padding="valid". With padding="same" and strides=1, the output height and width match the input size.
TensorFlow also supports lower-level explicit padding through tf.nn.conv2d by passing a padding list instead of only "SAME" or "VALID".
That level of control is useful when you need asymmetric padding.
same Padding in PyTorch
PyTorch nn.Conv2d accepts padding=0, an integer, a tuple, or string modes such as padding="same" and padding="valid". But there is an important restriction: current PyTorch docs state that padding="same" does not support stride values other than 1.
For manual symmetric padding, PyTorch usually uses integers or tuples:
If you need more control, you can pad first and convolve second.
The tuple order in F.pad is left, right, top, bottom, which is another detail that trips people up.
Why Equivalent Code Can Still Differ
Even if both models use a 3x3 kernel, the frameworks may disagree because of:
- different tensor layouts
- different meaning of manual padding values
- stride interaction with
samepadding - explicit versus implicit padding choices
- different low-level rounding behavior when output shapes are derived
The safest comparison is to compute expected output shapes directly and inspect the actual tensor dimensions after a forward pass.
A Practical Porting Strategy
When porting a convolution from TensorFlow to PyTorch or the reverse:
- match the tensor layout first
- match kernel size, stride, dilation, and groups
- replace
samewith explicit padding if the stride rules do not line up - test on one known input shape and compare outputs step by step
This is more reliable than trying to translate the layer declaration by eye.
Common Pitfalls
The biggest pitfall is forgetting that TensorFlow usually expects NHWC while PyTorch usually expects NCHW. Padding can look wrong when the real bug is channel order.
Another mistake is assuming padding="same" behaves identically in both frameworks for every stride. TensorFlow supports same more broadly, while current PyTorch documentation limits same to stride 1 for Conv2d.
Developers also mix up explicit padding conventions. A single integer in PyTorch means symmetric padding on height and width, while TensorFlow’s low-level explicit padding can specify each dimension separately.
Finally, do not compare only layer declarations. Always compare actual output shapes and, when necessary, the numerical outputs on the same test tensor.
Summary
- TensorFlow and PyTorch support the same high-level padding ideas, but the APIs differ in important details.
- Tensor layout is often the first source of mismatch: TensorFlow usually uses
NHWC, PyTorch usually usesNCHW. - TensorFlow allows
samepadding directly and also supports explicit padding lists in lower-level ops. - PyTorch supports
padding="same", but current docs note that it does not support stride values other than1forConv2d. - When porting models, verify layout and output shapes before assuming the padding rule is equivalent.

