Get column name based on condition in pandas

pandas

python

data manipulation

data analysis

dataframe

Get column name based on condition in pandas

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

In pandas, finding column names that satisfy a condition is usually a matter of building a boolean mask and applying it to df.columns. The exact pattern changes depending on whether the condition applies to an entire column, a single row, a data type, or an aggregate such as null percentage.

Columns where any value matches a condition

One of the most common cases is: return every column where at least one value satisfies a rule.

python

1import pandas as pd
2
3
4df = pd.DataFrame(
5    {
6        "A": [0, 4, 3, 5],
7        "B": [1, 3, 6, 4],
8        "C": [2, 1, 9, 8],
9        "D": [5, 1, 9, 8],
10    }
11)
12
13cols = df.columns[(df > 5).any()].tolist()
14print(cols)

df > 5 produces a boolean DataFrame. Calling .any() reduces each column to a single boolean, and indexing df.columns with that result returns the matching names.

The same pattern works for exact values.

python

cols = df.columns[(df == 0).any()].tolist()
print(cols)

Columns where all values match a condition

If the condition must hold for the whole column, use .all() instead of .any().

python

positive_cols = df.columns[(df >= 0).all()].tolist()
print(positive_cols)

This is useful for checks such as "all values are non-negative" or "every row in this column is populated."

Column names based on a specific row

Sometimes the question is row-specific: which columns in row i match a condition?

python

row_index = 2
cols = df.columns[df.iloc[row_index] == 9].tolist()
print(cols)

Because df.iloc[row_index] is a Series, the comparison returns a boolean mask over the columns of that row.

This is a different problem from checking an entire column, and mixing the two is a common source of confusion.

Column with the maximum or minimum value per row

If you want one column name per row, idxmax() and idxmin() are the most direct tools.

python

df["max_col"] = df.idxmax(axis=1)
df["min_col"] = df.idxmin(axis=1)
print(df[["max_col", "min_col"]])

With axis=1, pandas looks across columns for each row and returns the column label where the extreme value appears.

Filter columns before comparing

If your DataFrame contains strings, booleans, and numeric columns together, expressions such as df > 5 may fail or give unwanted behavior. A safe pattern is to select the relevant types first.

python

numeric = df.select_dtypes(include="number")
cols = numeric.columns[(numeric > 5).any()].tolist()
print(cols)

This is especially important in real analysis code, where mixed types are common.

Find columns with missing values or thresholds

Column selection often depends on summary conditions rather than raw values. Null detection is a good example.

python

1import numpy as np
2
3
4df2 = pd.DataFrame(
5    {
6        "name": ["Alice", "Bob", None],
7        "score": [10, np.nan, 8],
8        "bonus": [None, None, 5],
9    }
10)
11
12null_cols = df2.columns[df2.isna().any()].tolist()
13sparse_cols = df2.columns[df2.isna().mean() > 0.5].tolist()
14
15print(null_cols)
16print(sparse_cols)

isna().mean() is a compact way to compute the fraction of missing values per column.

Name-based conditions

Not every condition is about data values. Sometimes you want column names matching a pattern.

python

flag_cols = [col for col in df2.columns if col.startswith("s")]
print(flag_cols)

Pandas also provides filter() for label-based selection.

python

print(df2.filter(regex="^(name|score)$").columns.tolist())

Common Pitfalls

A common mistake is comparing the full DataFrame when only numeric columns should be considered. Mixed types can raise errors or produce misleading results.

Another issue is forgetting the difference between .any() and .all(). They answer different questions, and swapping them changes the meaning of the result completely.

It is also easy to misread idxmax() behavior when ties exist. It returns the first matching column label, not every tied column.

Summary

Use df.columns[(condition).any()] when any value in a column may satisfy the rule.
Use .all() when the entire column must satisfy the condition.
Use df.iloc[row_index] for row-specific column lookups.
Use idxmax(axis=1) or idxmin(axis=1) when you want one winning column per row.
Filter by data type first when mixed columns make direct comparisons unsafe.