How to filter Pandas dataframe using 'in' and 'not in' like in SQL
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
SQL uses IN and NOT IN to filter rows against a set of allowed or excluded values. In pandas, the direct equivalent is .isin(), combined with boolean indexing and the ~ operator for negation.
SQL IN in pandas
Suppose you want only rows whose department is in a selected list:
This is the pandas equivalent of:
.isin() returns a boolean Series, and df[...] keeps only the rows where that Series is True.
SQL NOT IN in pandas
To express NOT IN, negate the boolean mask with ~:
The pattern is simple:
- '
df["col"].isin(values)forIN' - '
~df["col"].isin(values)forNOT IN'
Combine Membership with Other Conditions
You can mix .isin() with other boolean filters using & for element-wise and and | for element-wise or:
Parentheses matter because pandas boolean expressions use bitwise operators for element-wise logic.
Filter More Than One Column
You can also use .isin() across multiple columns independently:
This is still ordinary boolean indexing. .isin() just gives you the masks.
Use External Lists, Sets, or Series
The allowed or excluded values do not have to be hard-coded. They can come from another Series, a configuration file, or user input:
A set can also be convenient when you build the values dynamically, although pandas handles the membership comparison internally once the collection is passed to .isin().
Exact Matching Means Types and Case Matter
.isin() performs exact comparisons. That means:
- '
"1"is different from1' - '
"sales"is different from"Sales"'
If your text data is messy, normalize it before filtering:
This is often better than trying to debug "missing" matches that are really casing issues.
Missing Values Need Explicit Handling
Missing values do not behave like SQL NULL automatically. If the business rule cares about missing rows, handle them explicitly with .isna() or .notna():
That keeps the rule readable instead of hoping missing values fall into the right branch by accident.
The Plain Python in Operator Is Not the Same Thing
This is a common mistake:
That does not produce a row-by-row membership mask. Use .isin() for element-wise membership testing on a Series.
Common Pitfalls
- Using Python's plain
inoperator on a Series instead of.isin(). - Forgetting parentheses when combining
.isin()with other conditions. - Using
notinstead of~for a boolean Series. - Overlooking exact-match behavior when casing or data types differ.
- Ignoring missing values when the filtering rule should account for them explicitly.
Summary
- Use
.isin()for SQL-styleINlogic in pandas. - Negate the result with
~to expressNOT IN. - Combine membership checks with
&and|for more complex filters. - Normalize values when case or dtype differences affect matching.
- Handle missing values explicitly instead of assuming SQL-like null semantics.

