How to filter for rows with close values across columns
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
If you want rows where values across several columns are "close" to each other, the first thing to define is what close means. In many pandas workflows, the simplest definition is that the difference between the largest and smallest value in the row must stay below a threshold.
That rule is easy to compute, easy to explain, and works well when you want all selected columns in a row to be roughly aligned. Other definitions are possible, but max - min <= tolerance is usually the cleanest starting point.
A Straightforward pandas Solution
Suppose you have a DataFrame like this:
If you want rows where all three values are within 1.0 of each other, compute the row-wise range:
This keeps rows where the spread across the selected columns is small.
Why it works:
- '
df.max(axis=1)gives the largest value in each row' - '
df.min(axis=1)gives the smallest value in each row' - their difference is the total row spread
If that spread is below your tolerance, the row qualifies.
Restricting the Check to Specific Columns
Often you do not want to compare every numeric column in the frame. You only want a subset:
This is usually better than applying the rule to the full DataFrame blindly, especially when the table contains IDs, labels, or unrelated measurements.
Pairwise Closeness with np.isclose
Sometimes your rule is stricter or more explicit: every pair of columns must be close under a numeric tolerance. In that case, NumPy's isclose is useful.
This is more verbose than the row-range method, but it can be easier to adapt when the rule is specifically about pairwise comparisons or when you want relative tolerances as well.
Relative Tolerance Matters Sometimes
If values span very different magnitudes, absolute tolerance may not be enough. A difference of 1.0 is tiny near 1000, but huge near 2.
In that case, np.isclose can use rtol for relative tolerance:
This means the acceptable difference scales with the size of the values.
Handling Missing Values
Missing values change the picture. By default, row-wise min and max operations often ignore NaN, which may or may not be what you want.
If rows with missing values should be rejected, add an explicit completeness check:
This keeps the filtering rule honest instead of silently treating incomplete rows as valid.
Why the Row-Range Method Is Often Best
For a "values across columns are close" requirement, max - min <= tol is hard to beat because it captures the whole-row spread in one expression. If the largest and smallest values are already close, then every value in between is automatically close as well.
That makes the logic compact and mathematically clean for many data-cleaning tasks.
Common Pitfalls
The biggest pitfall is not defining "close" precisely. Absolute tolerance, relative tolerance, and pairwise closeness are different rules, and they do not always select the same rows.
Another common issue is accidentally including columns that should not participate in the comparison, such as IDs or categorical encodings.
People also forget about missing values. A row with NaN can slip through or be misinterpreted unless you decide explicitly how missing data should behave.
Finally, if the selected columns have very different scales, absolute differences alone may not be meaningful. In that case, consider relative tolerance or normalize the data first.
Summary
- A simple way to filter for close values across columns is
row_max - row_min <= tolerance. - In pandas, this is easy with
df[cols].max(axis=1) - df[cols].min(axis=1). - Use
np.isclosewhen you want pairwise or relative-tolerance checks instead. - Restrict the comparison to the columns that actually matter.
- Decide explicitly how to handle missing values and whether closeness should be absolute or relative.

