pandas
DataFrame
does-not-contain
search
Python

Search for does-not-contain on a DataFrame in pandas

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In pandas, "does not contain" is usually expressed by negating a boolean mask built with Series.str.contains. The important detail is handling missing values correctly. Without that, negation can behave unexpectedly or raise errors when the column contains NaN, especially in messy real-world datasets.

Basic Pattern With str.contains

Suppose you want rows where the city column does not contain "Los":

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "city": ["Los Angeles", "Boston", "San Diego", "Los Gatos"]
5})
6
7mask = ~df["city"].str.contains("Los", na=False)
8result = df[mask]
9
10print(result)

The key parts are:

  • 'str.contains("Los") creates a boolean Series.'
  • '~ negates it.'
  • 'na=False treats missing values as non-matches, which makes the negation predictable.'

Without na=False, null values can propagate into the mask and cause confusion.

Why na=False Matters

Consider this data:

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "city": ["Los Angeles", None, "Chicago"]
5})

If you write:

python
mask = ~df["city"].str.contains("Los")

the result may contain missing boolean values rather than a clean True or False for every row. A safer version is:

python
mask = ~df["city"].str.contains("Los", na=False)
result = df[mask]

Now missing values are treated as "does not contain" for filtering purposes.

Literal Substrings Versus Regular Expressions

By default, str.contains interprets the pattern as a regular expression. If you want a literal string match, set regex=False:

python
mask = ~df["city"].str.contains("C++", regex=False, na=False)

This matters when the search term includes characters such as ., +, *, or ?, which have special meaning in regex syntax.

If you do want regex behavior, keep the default or set regex=True explicitly:

python
mask = ~df["city"].str.contains(r"^Los\s", na=False)

That excludes values starting with "Los ".

Case-Insensitive "Does Not Contain"

Use case=False when letter case should not matter:

python
mask = ~df["city"].str.contains("los", case=False, na=False)
result = df[mask]

This filters out "Los Angeles", "los gatos", and "LOS ALTOS" equally.

Filtering Multiple Columns

If you need rows where one column does not contain a term and another column also does not contain a different term, combine masks:

python
1df = pd.DataFrame({
2    "title": ["Admin Guide", "User Guide", "Draft Notes"],
3    "category": ["internal", "public", "internal"]
4})
5
6mask = (
7    ~df["title"].str.contains("Draft", na=False) &
8    ~df["category"].str.contains("internal", na=False)
9)
10
11result = df[mask]
12print(result)

This is often cleaner than trying to write one giant condition inline.

Searching The Entire Row

If the requirement is "the row does not contain this substring anywhere," apply the logic column by column and then reduce:

python
1mask = ~df.astype(str).apply(
2    lambda col: col.str.contains("secret", case=False, na=False)
3).any(axis=1)
4
5result = df[mask]

That approach is more expensive, but it is useful for broad searches across multiple text fields.

Common Pitfalls

The biggest mistake is forgetting that str.contains uses regex by default. If your search string has regex metacharacters, your filter may exclude the wrong rows.

Another common issue is not setting na=False. Missing values can make the boolean mask ambiguous or produce surprising results after negation.

Developers also sometimes forget parentheses when combining masks with & and |. In pandas, each condition should usually be parenthesized clearly.

Finally, do not use Python's plain not with a pandas Series. Use ~ for elementwise negation of boolean masks.

Summary

  • Use ~df["col"].str.contains("text", na=False) for a standard "does not contain" filter.
  • Add regex=False when the search term should be treated literally.
  • Add case=False for case-insensitive matching.
  • Combine masks with & and | for multi-column logic.
  • Handle missing values explicitly so the negated filter stays predictable.

Course illustration
Course illustration

All Rights Reserved.