Pandas
DataFrame
SQL
Data Analysis
Python

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

SQL uses IN and NOT IN to filter rows against a set of allowed or excluded values. In pandas, the direct equivalent is .isin(), combined with boolean indexing and the ~ operator for negation.

SQL IN in pandas

Suppose you want only rows whose department is in a selected list:

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "name": ["Alice", "Bob", "Carol", "Dan"],
6        "department": ["Sales", "Engineering", "Sales", "HR"],
7        "salary": [70000, 95000, 72000, 65000],
8    }
9)
10
11allowed = ["Sales", "HR"]
12result = df[df["department"].isin(allowed)]
13
14print(result)

This is the pandas equivalent of:

sql
SELECT *
FROM employees
WHERE department IN ('Sales', 'HR');

.isin() returns a boolean Series, and df[...] keeps only the rows where that Series is True.

SQL NOT IN in pandas

To express NOT IN, negate the boolean mask with ~:

python
1excluded = ["Sales", "HR"]
2result = df[~df["department"].isin(excluded)]
3
4print(result)

The pattern is simple:

  • 'df["col"].isin(values) for IN'
  • '~df["col"].isin(values) for NOT IN'

Combine Membership with Other Conditions

You can mix .isin() with other boolean filters using & for element-wise and and | for element-wise or:

python
1result = df[
2    df["department"].isin(["Sales", "Engineering"])
3    & (df["salary"] >= 72000)
4]
5
6print(result)

Parentheses matter because pandas boolean expressions use bitwise operators for element-wise logic.

Filter More Than One Column

You can also use .isin() across multiple columns independently:

python
1result = df[
2    df["department"].isin(["Sales", "HR"])
3    & df["name"].isin(["Alice", "Dan"])
4]
5
6print(result)

This is still ordinary boolean indexing. .isin() just gives you the masks.

Use External Lists, Sets, or Series

The allowed or excluded values do not have to be hard-coded. They can come from another Series, a configuration file, or user input:

python
active_departments = pd.Series(["Sales", "HR"])
result = df[df["department"].isin(active_departments)]
print(result)

A set can also be convenient when you build the values dynamically, although pandas handles the membership comparison internally once the collection is passed to .isin().

Exact Matching Means Types and Case Matter

.isin() performs exact comparisons. That means:

  • '"1" is different from 1'
  • '"sales" is different from "Sales"'

If your text data is messy, normalize it before filtering:

python
result = df[df["department"].str.lower().isin(["sales", "hr"])]
print(result)

This is often better than trying to debug "missing" matches that are really casing issues.

Missing Values Need Explicit Handling

Missing values do not behave like SQL NULL automatically. If the business rule cares about missing rows, handle them explicitly with .isna() or .notna():

python
result = df[df["department"].isna() | df["department"].isin(["Sales"])]
print(result)

That keeps the rule readable instead of hoping missing values fall into the right branch by accident.

The Plain Python in Operator Is Not the Same Thing

This is a common mistake:

python
# Wrong for element-wise filtering
"Sales" in df["department"]

That does not produce a row-by-row membership mask. Use .isin() for element-wise membership testing on a Series.

Common Pitfalls

  • Using Python's plain in operator on a Series instead of .isin().
  • Forgetting parentheses when combining .isin() with other conditions.
  • Using not instead of ~ for a boolean Series.
  • Overlooking exact-match behavior when casing or data types differ.
  • Ignoring missing values when the filtering rule should account for them explicitly.

Summary

  • Use .isin() for SQL-style IN logic in pandas.
  • Negate the result with ~ to express NOT IN.
  • Combine membership checks with & and | for more complex filters.
  • Normalize values when case or dtype differences affect matching.
  • Handle missing values explicitly instead of assuming SQL-like null semantics.

Course illustration
Course illustration

All Rights Reserved.