Ignoring NaNs with str.contains
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When using str.contains() in pandas to filter string data, NaN values in the Series cause the result to contain NaN instead of True/False, which breaks boolean indexing. The fix is to pass na=False (or na=True) to tell pandas how to treat missing values. Without this parameter, any row with NaN produces NaN in the boolean mask, raising errors or silently dropping rows during filtering.
The Problem
Using this result directly as a boolean filter raises a ValueError:
The Fix: na=False
Pass na=False to treat NaN values as "does not contain the pattern":
Setting na=True treats NaN as matching:
How na= Works with Other str Methods
The na parameter is available on most pandas string accessor methods:
Using Regex with str.contains
str.contains supports regular expressions by default:
To search for literal special characters, disable regex:
Filtering a DataFrame
Alternative: Drop NaN First
Instead of using na=False, you can drop NaN values before filtering:
Using na=False is the most concise and idiomatic approach.
Performance Comparison
na=False is both faster and preserves the original index.
Common Pitfalls
- Forgetting
na=False: Without it, NaN values produce NaN in the boolean mask, causingValueErroror silent data loss when used for filtering. Always passna=Falseunless you specifically want NaN rows included. - Not escaping regex metacharacters: Characters like
.,$,(,*, and+have special meaning in regex. Useregex=Falsefor literal string matching, or escape withre.escape(). - Assuming
na=Falsedrops NaN rows:na=Falsedoes not remove NaN rows from the result. It replaces NaN with False in the boolean mask, so NaN rows appear as non-matching rows in the filtered output. - Case sensitivity:
str.containsis case-sensitive by default. Usecase=Falsefor case-insensitive matching rather than converting the entire column withstr.lower()first. - Using
str.containson non-string columns: If the column has mixed types (strings and integers),str.containsconverts everything to string first, which may produce unexpected matches. Ensure the column dtype is string or object before filtering.
Summary
- Pass
na=Falsetostr.contains()to treat NaN values as non-matching (False) - Pass
na=Trueto treat NaN values as matching (True) - The
naparameter works onstr.startswith,str.endswith,str.match, and other string accessor methods - Use
regex=Falsewhen searching for literal strings that contain regex metacharacters na=Falseis faster and more idiomatic thanfillna("")ordropna()workarounds

