Get column name based on condition in pandas
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In pandas, finding column names that satisfy a condition is usually a matter of building a boolean mask and applying it to df.columns. The exact pattern changes depending on whether the condition applies to an entire column, a single row, a data type, or an aggregate such as null percentage.
Columns where any value matches a condition
One of the most common cases is: return every column where at least one value satisfies a rule.
df > 5 produces a boolean DataFrame. Calling .any() reduces each column to a single boolean, and indexing df.columns with that result returns the matching names.
The same pattern works for exact values.
Columns where all values match a condition
If the condition must hold for the whole column, use .all() instead of .any().
This is useful for checks such as "all values are non-negative" or "every row in this column is populated."
Column names based on a specific row
Sometimes the question is row-specific: which columns in row i match a condition?
Because df.iloc[row_index] is a Series, the comparison returns a boolean mask over the columns of that row.
This is a different problem from checking an entire column, and mixing the two is a common source of confusion.
Column with the maximum or minimum value per row
If you want one column name per row, idxmax() and idxmin() are the most direct tools.
With axis=1, pandas looks across columns for each row and returns the column label where the extreme value appears.
Filter columns before comparing
If your DataFrame contains strings, booleans, and numeric columns together, expressions such as df > 5 may fail or give unwanted behavior. A safe pattern is to select the relevant types first.
This is especially important in real analysis code, where mixed types are common.
Find columns with missing values or thresholds
Column selection often depends on summary conditions rather than raw values. Null detection is a good example.
isna().mean() is a compact way to compute the fraction of missing values per column.
Name-based conditions
Not every condition is about data values. Sometimes you want column names matching a pattern.
Pandas also provides filter() for label-based selection.
Common Pitfalls
A common mistake is comparing the full DataFrame when only numeric columns should be considered. Mixed types can raise errors or produce misleading results.
Another issue is forgetting the difference between .any() and .all(). They answer different questions, and swapping them changes the meaning of the result completely.
It is also easy to misread idxmax() behavior when ties exist. It returns the first matching column label, not every tied column.
Summary
- Use
df.columns[(condition).any()]when any value in a column may satisfy the rule. - Use
.all()when the entire column must satisfy the condition. - Use
df.iloc[row_index]for row-specific column lookups. - Use
idxmax(axis=1)oridxmin(axis=1)when you want one winning column per row. - Filter by data type first when mixed columns make direct comparisons unsafe.

