Check if multiple strings exist in another string
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Checking whether multiple substrings exist within a larger string is a routine task in software development. It comes up when filtering logs, validating user input, scanning documents for keywords, and building search features. Different languages offer different idiomatic solutions, and choosing the right approach can make the difference between clean, readable code and a tangled mess of nested loops.
This article walks through practical techniques in Python, JavaScript, and Java, then covers performance considerations and common pitfalls.
Using Python's any() and all()
Python's built-in any() and all() functions pair naturally with generator expressions to check for substring presence.
Check if any of the substrings exist
Check if all substrings exist
The in operator in Python performs a substring search internally using an optimized algorithm, so for most everyday workloads this approach is both readable and fast.
Using JavaScript's Array.some() and Array.every()
JavaScript provides similar expressiveness through array methods.
Both some() and every() short-circuit. some() stops as soon as one match is found, and every() stops as soon as one mismatch is found. This short-circuit behavior keeps performance reasonable even with long keyword lists.
Using Java Streams
Java 8 and later versions offer streams that mirror the same pattern.
Using Regular Expressions
When you need pattern matching rather than exact substring matching, regex is the right tool. You can combine multiple patterns with the alternation operator |.
For checking whether all patterns match, you need to test each one individually, because alternation only tells you that at least one matched.
Scaling Up with the Aho-Corasick Algorithm
When the number of substrings is large (hundreds or thousands), the naive approach of checking each one individually becomes expensive. The Aho-Corasick algorithm builds a finite-state automaton from all the search patterns, then scans the text in a single pass.
This approach processes the text in O(n + m + z) time, where n is the text length, m is the total length of all patterns, and z is the number of matches. It is the standard choice for intrusion detection systems, content filters, and large-scale log analysis.
Performance Considerations
The right method depends on your scale:
- For a small number of short substrings (under 20), the simple loop with
inorincludes()is perfectly fine. - For case-insensitive matching, convert both the text and the keywords to lowercase before comparing, rather than using regex with the
re.IGNORECASEflag on each check. - For very large texts or many patterns, Aho-Corasick or suffix-based data structures outperform repeated linear scans.
Common Pitfalls
- Forgetting case sensitivity. "Fox" and "fox" are different strings. Normalize both sides before comparing.
- Matching substrings unintentionally. Searching for "cat" will match inside "concatenate". If you need whole-word matching, use regex word boundaries (
\bcat\b) or split the text into tokens first. - Modifying the keyword list during iteration. In languages like Java, modifying a collection while streaming over it throws a
ConcurrentModificationException. Build your keyword list before you start checking. - Ignoring Unicode normalization. Characters like accented letters can have multiple Unicode representations. Two strings that look identical on screen may fail an equality check. Use
unicodedata.normalize()in Python orString.normalize()in JavaScript when working with international text.
Summary
Checking for multiple substrings is a solved problem with well-known patterns across languages. Use any()/all() in Python, some()/every() in JavaScript, or streams in Java for everyday tasks. Reach for regex when you need pattern flexibility, and consider Aho-Corasick when working at scale. The most common mistakes involve case sensitivity, unintended partial matches, and Unicode normalization, all of which are straightforward to handle once you are aware of them.

