Search for does-not-contain on a DataFrame in pandas

pandas

DataFrame

does-not-contain

Python

Search for does-not-contain on a DataFrame in pandas

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

In pandas, "does not contain" is usually expressed by negating a boolean mask built with Series.str.contains. The important detail is handling missing values correctly. Without that, negation can behave unexpectedly or raise errors when the column contains NaN, especially in messy real-world datasets.

Basic Pattern With `str.contains`

Suppose you want rows where the city column does not contain "Los":

python

1import pandas as pd
2
3df = pd.DataFrame({
4    "city": ["Los Angeles", "Boston", "San Diego", "Los Gatos"]
5})
6
7mask = ~df["city"].str.contains("Los", na=False)
8result = df[mask]
9
10print(result)

The key parts are:

'str.contains("Los") creates a boolean Series.'
'~ negates it.'
'na=False treats missing values as non-matches, which makes the negation predictable.'

Without na=False, null values can propagate into the mask and cause confusion.

Why `na=False` Matters

Consider this data:

python

1import pandas as pd
2
3df = pd.DataFrame({
4    "city": ["Los Angeles", None, "Chicago"]
5})

If you write:

python

mask = ~df["city"].str.contains("Los")

the result may contain missing boolean values rather than a clean True or False for every row. A safer version is:

python

mask = ~df["city"].str.contains("Los", na=False)
result = df[mask]

Now missing values are treated as "does not contain" for filtering purposes.

Literal Substrings Versus Regular Expressions

By default, str.contains interprets the pattern as a regular expression. If you want a literal string match, set regex=False:

python

mask = ~df["city"].str.contains("C++", regex=False, na=False)

This matters when the search term includes characters such as ., +, *, or ?, which have special meaning in regex syntax.

If you do want regex behavior, keep the default or set regex=True explicitly:

python

mask = ~df["city"].str.contains(r"^Los\s", na=False)

That excludes values starting with "Los ".

Case-Insensitive "Does Not Contain"

Use case=False when letter case should not matter:

python

mask = ~df["city"].str.contains("los", case=False, na=False)
result = df[mask]

This filters out "Los Angeles", "los gatos", and "LOS ALTOS" equally.

Filtering Multiple Columns

If you need rows where one column does not contain a term and another column also does not contain a different term, combine masks:

python

1df = pd.DataFrame({
2    "title": ["Admin Guide", "User Guide", "Draft Notes"],
3    "category": ["internal", "public", "internal"]
4})
5
6mask = (
7    ~df["title"].str.contains("Draft", na=False) &
8    ~df["category"].str.contains("internal", na=False)
9)
10
11result = df[mask]
12print(result)

This is often cleaner than trying to write one giant condition inline.

Searching The Entire Row

If the requirement is "the row does not contain this substring anywhere," apply the logic column by column and then reduce:

python

1mask = ~df.astype(str).apply(
2    lambda col: col.str.contains("secret", case=False, na=False)
3).any(axis=1)
4
5result = df[mask]

That approach is more expensive, but it is useful for broad searches across multiple text fields.

Common Pitfalls

The biggest mistake is forgetting that str.contains uses regex by default. If your search string has regex metacharacters, your filter may exclude the wrong rows.

Another common issue is not setting na=False. Missing values can make the boolean mask ambiguous or produce surprising results after negation.

Developers also sometimes forget parentheses when combining masks with & and |. In pandas, each condition should usually be parenthesized clearly.

Finally, do not use Python's plain not with a pandas Series. Use ~ for elementwise negation of boolean masks.

Summary

Use ~df["col"].str.contains("text", na=False) for a standard "does not contain" filter.
Add regex=False when the search term should be treated literally.
Add case=False for case-insensitive matching.
Combine masks with & and | for multi-column logic.
Handle missing values explicitly so the negated filter stays predictable.

Search for does-not-contain on a DataFrame in pandas

Master System Design with Codemia

Introduction

Basic Pattern With str.contains

Why na=False Matters

Literal Substrings Versus Regular Expressions

Case-Insensitive "Does Not Contain"

Filtering Multiple Columns

Searching The Entire Row

Common Pitfalls

Summary

Basic Pattern With `str.contains`

Why `na=False` Matters