pandas
Python
DataFrame
data manipulation
conditional expression

How to delete rows from a pandas DataFrame based on a conditional expression

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In pandas, you usually do not "delete rows" one by one. The idiomatic approach is to build a boolean condition, then keep only the rows that do not match that condition. This is faster, clearer, and easier to combine with other filtering logic than looping through the DataFrame.

The most common pattern: boolean filtering

Suppose you want to remove every row where status == "inactive":

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "user": ["a", "b", "c", "d"],
6        "status": ["active", "inactive", "active", "inactive"],
7        "score": [91, 42, 77, 15],
8    }
9)
10
11filtered = df[df["status"] != "inactive"]
12print(filtered)

This is the cleanest pattern for most tasks. The expression df["status"] != "inactive" creates a boolean mask, and pandas returns only the rows where the mask is True.

If you prefer .loc, the equivalent is:

python
filtered = df.loc[df["status"] != "inactive"]

Both forms are common. .loc becomes especially useful when you are also selecting columns.

Remove rows with more complex conditions

Conditions can be combined with &, |, and ~:

python
filtered = df[(df["status"] == "active") & (df["score"] >= 80)]
print(filtered)

To remove rows where the score is below 50 or the user is "c":

python
filtered = df[~((df["score"] < 50) | (df["user"] == "c"))]
print(filtered)

The ~ operator means logical negation, so a common mental model is:

  1. Write the condition describing rows to remove.
  2. Negate it with ~.
  3. Use the result to keep everything else.

Parentheses matter here because pandas overloads bitwise operators for vectorized boolean logic.

Using drop() with matching indexes

Sometimes you really do want drop(), especially when you already have the index positions of the rows to remove:

python
rows_to_drop = df[df["score"] < 50].index
result = df.drop(rows_to_drop)
print(result)

This works, but it is usually a little more verbose than direct boolean filtering. It is most useful when the indexes to remove are reused elsewhere or computed in a separate step.

Using query() for readable conditions

query() can be pleasant when the condition is easier to read as an expression string:

python
result = df.query("status != 'inactive' and score >= 50")
print(result)

This is largely a style choice. Some teams prefer query() for analyst-friendly code, while others prefer explicit boolean masks because they are easier to debug in pure Python.

Be careful with assignment and copies

If you want to replace the original DataFrame, reassign it:

python
df = df[df["status"] != "inactive"]

That is generally clearer than trying to mutate in place. Although drop(..., inplace=True) exists, explicit reassignment is easier to reason about and fits better with method chaining.

Also remember that filtering creates a new DataFrame. If you need to modify the filtered result later, using .copy() can avoid chained-assignment confusion:

python
df_active = df[df["status"] == "active"].copy()
df_active["passed"] = df_active["score"] >= 60

Handling missing values in conditions

Missing values deserve special attention because comparisons involving NaN can behave differently than beginners expect.

To remove rows with missing scores:

python
result = df[df["score"].notna()]
print(result)

To remove rows where status is missing or inactive:

python
result = df[df["status"].notna() & (df["status"] != "inactive")]
print(result)

Being explicit with isna() and notna() keeps the logic predictable.

Common Pitfalls

The most common mistake is forgetting parentheses when combining conditions. In pandas, (cond1) & (cond2) is correct, while cond1 & cond2 without parentheses can behave unexpectedly.

Another issue is using Python's and and or instead of & and |. With pandas Series, you need the vectorized operators.

Developers also sometimes call drop() with row positions instead of row labels and get confusing results when the index is not a simple RangeIndex. If you mean positions, make that explicit before dropping.

Finally, remember that filtering returns a new DataFrame. If later code still references the old variable, it will not magically update unless you reassign it.

Summary

  • The usual way to remove rows in pandas is to keep only rows that do not match a boolean condition.
  • Use boolean masks or .loc for most filtering tasks.
  • Use drop() when you already have the exact indexes to remove.
  • Use query() when the condition reads more naturally as an expression string.
  • Be explicit with parentheses, missing values, and reassignment to avoid subtle bugs.

Course illustration
Course illustration