pandas
dataframe
python
data manipulation
data analysis

How to iterate over columns of a pandas dataframe

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Iterating over DataFrame columns in pandas is straightforward, but the best method depends on what you are trying to do with each column. In many cases, direct iteration is fine for inspection or metadata work, but vectorized operations are still the better choice when you want actual data transformation at scale.

The simplest column iteration

If you only need column names, iterate over the DataFrame itself or over df.columns.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "city": ["A", "B", "C"],
5    "sales": [10, 20, 30],
6    "margin": [0.1, 0.2, 0.15],
7})
8
9for column_name in df.columns:
10    print(column_name)

This is useful for schema inspection, logging, or dynamically building reports.

Iterate over column name and Series together

If you need the actual column data, items() is the usual method.

python
1for column_name, series in df.items():
2    print("name:", column_name)
3    print("dtype:", series.dtype)
4    print("sum if numeric:", series.sum() if pd.api.types.is_numeric_dtype(series) else "n/a")

Each series is a pandas.Series representing one column.

This is the most practical choice when:

  • validating per-column data
  • computing column-specific summaries
  • applying different logic by dtype

Example: process only numeric columns

Column iteration is often combined with dtype checks.

python
1import pandas as pd
2
3for name, series in df.items():
4    if pd.api.types.is_numeric_dtype(series):
5        print(name, "mean =", series.mean())

This is common in exploratory analysis or data-quality diagnostics.

But prefer vectorized operations for real transformations

Many tasks that look like "iterate over columns" can be written more idiomatically with pandas operations.

Instead of:

python
for name, series in df.items():
    if pd.api.types.is_numeric_dtype(series):
        df[name] = series * 100

You can often write:

python
numeric_cols = df.select_dtypes(include="number").columns
df[numeric_cols] = df[numeric_cols] * 100

This is usually faster, shorter, and easier to reason about.

Iterate when you truly need per-column custom logic

Sometimes vectorization is not the right tool because each column has different rules.

python
1rules = {
2    "sales": lambda s: s.fillna(0),
3    "margin": lambda s: s.clip(lower=0, upper=1),
4}
5
6for name, series in df.items():
7    if name in rules:
8        df[name] = rules[name](series)

This keeps the logic explicit instead of forcing everything into one generic transform.

Access columns by position when needed

If you must iterate by index rather than name, use iloc carefully:

python
for i in range(df.shape[1]):
    series = df.iloc[:, i]
    print(i, series.name, series.head().tolist())

This is more useful when column positions are part of the requirement, such as legacy export formats.

Common alternatives to remember

Useful related tools:

  • 'df.columns for names only'
  • 'df.items() for name and Series'
  • 'df.select_dtypes(...) for subsets by type'
  • 'df.apply(...) for column-wise function application'

For example:

python
summary = df.select_dtypes(include="number").apply(["min", "max", "mean"])
print(summary)

That avoids a manual loop entirely.

Common Pitfalls

The most common mistake is iterating over columns for a transformation that pandas can already do vectorially in one line. Another is confusing row iteration methods such as iterrows() with column iteration and ending up with the wrong object shape. Developers also sometimes mutate the DataFrame in-place while making assumptions about dtype that break on mixed-type columns. Using iloc for positional loops without checking schema stability is another risk in production code. Finally, people often forget that df.items() is the modern column-wise iteration method and keep reaching for less direct patterns.

Summary

  • Use df.columns when you only need column names.
  • Use df.items() when you need both the name and the column Series.
  • Prefer vectorized operations for bulk transformation work.
  • Iterate explicitly only when column-specific logic is genuinely different.
  • Use dtype checks to keep numeric and non-numeric handling separate.
  • Pick the most direct pandas API for the task rather than defaulting to manual loops.

Course illustration
Course illustration

All Rights Reserved.