Search for does-not-contain on a DataFrame in pandas
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In pandas, "does not contain" is usually expressed by negating a boolean mask built with Series.str.contains. The important detail is handling missing values correctly. Without that, negation can behave unexpectedly or raise errors when the column contains NaN, especially in messy real-world datasets.
Basic Pattern With str.contains
Suppose you want rows where the city column does not contain "Los":
The key parts are:
- '
str.contains("Los")creates a boolean Series.' - '
~negates it.' - '
na=Falsetreats missing values as non-matches, which makes the negation predictable.'
Without na=False, null values can propagate into the mask and cause confusion.
Why na=False Matters
Consider this data:
If you write:
the result may contain missing boolean values rather than a clean True or False for every row. A safer version is:
Now missing values are treated as "does not contain" for filtering purposes.
Literal Substrings Versus Regular Expressions
By default, str.contains interprets the pattern as a regular expression. If you want a literal string match, set regex=False:
This matters when the search term includes characters such as ., +, *, or ?, which have special meaning in regex syntax.
If you do want regex behavior, keep the default or set regex=True explicitly:
That excludes values starting with "Los ".
Case-Insensitive "Does Not Contain"
Use case=False when letter case should not matter:
This filters out "Los Angeles", "los gatos", and "LOS ALTOS" equally.
Filtering Multiple Columns
If you need rows where one column does not contain a term and another column also does not contain a different term, combine masks:
This is often cleaner than trying to write one giant condition inline.
Searching The Entire Row
If the requirement is "the row does not contain this substring anywhere," apply the logic column by column and then reduce:
That approach is more expensive, but it is useful for broad searches across multiple text fields.
Common Pitfalls
The biggest mistake is forgetting that str.contains uses regex by default. If your search string has regex metacharacters, your filter may exclude the wrong rows.
Another common issue is not setting na=False. Missing values can make the boolean mask ambiguous or produce surprising results after negation.
Developers also sometimes forget parentheses when combining masks with & and |. In pandas, each condition should usually be parenthesized clearly.
Finally, do not use Python's plain not with a pandas Series. Use ~ for elementwise negation of boolean masks.
Summary
- Use
~df["col"].str.contains("text", na=False)for a standard "does not contain" filter. - Add
regex=Falsewhen the search term should be treated literally. - Add
case=Falsefor case-insensitive matching. - Combine masks with
&and|for multi-column logic. - Handle missing values explicitly so the negated filter stays predictable.

