string matching
substring search
text processing
string comparison
programming tutorial

Check if multiple strings exist in another string

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Checking whether multiple substrings exist within a larger string is a routine task in software development. It comes up when filtering logs, validating user input, scanning documents for keywords, and building search features. Different languages offer different idiomatic solutions, and choosing the right approach can make the difference between clean, readable code and a tangled mess of nested loops.

This article walks through practical techniques in Python, JavaScript, and Java, then covers performance considerations and common pitfalls.

Using Python's any() and all()

Python's built-in any() and all() functions pair naturally with generator expressions to check for substring presence.

Check if any of the substrings exist

python
1text = "The quick brown fox jumps over the lazy dog"
2keywords = ["fox", "cat", "elephant"]
3
4if any(word in text for word in keywords):
5    print("At least one keyword found")

Check if all substrings exist

python
1required = ["quick", "fox", "dog"]
2
3if all(word in required_text for word in required):
4    print("All required words are present")

The in operator in Python performs a substring search internally using an optimized algorithm, so for most everyday workloads this approach is both readable and fast.

Using JavaScript's Array.some() and Array.every()

JavaScript provides similar expressiveness through array methods.

javascript
1const text = "The quick brown fox jumps over the lazy dog";
2const keywords = ["fox", "cat", "elephant"];
3
4// Check if any keyword exists
5const hasAny = keywords.some(word => text.includes(word));
6
7// Check if all keywords exist
8const required = ["quick", "fox", "dog"];
9const hasAll = required.every(word => text.includes(word));
10
11console.log("Has any:", hasAny);   // true
12console.log("Has all:", hasAll);   // true

Both some() and every() short-circuit. some() stops as soon as one match is found, and every() stops as soon as one mismatch is found. This short-circuit behavior keeps performance reasonable even with long keyword lists.

Using Java Streams

Java 8 and later versions offer streams that mirror the same pattern.

java
1import java.util.List;
2
3public class MultiStringCheck {
4    public static void main(String[] args) {
5        String text = "The quick brown fox jumps over the lazy dog";
6        List<String> keywords = List.of("fox", "cat", "elephant");
7
8        boolean anyMatch = keywords.stream()
9            .anyMatch(text::contains);
10
11        boolean allMatch = keywords.stream()
12            .allMatch(text::contains);
13
14        System.out.println("Any match: " + anyMatch);  // true
15        System.out.println("All match: " + allMatch);   // false
16    }
17}

Using Regular Expressions

When you need pattern matching rather than exact substring matching, regex is the right tool. You can combine multiple patterns with the alternation operator |.

python
1import re
2
3text = "Error 404: page not found. Retry after 5 seconds."
4patterns = ["Error \\d+", "not found", "timeout"]
5
6# Build a combined regex
7combined = "|".join(patterns)
8matches = re.findall(combined, text)
9
10print(matches)  # ['Error 404', 'not found']

For checking whether all patterns match, you need to test each one individually, because alternation only tells you that at least one matched.

python
all_present = all(re.search(p, text) for p in patterns)

Scaling Up with the Aho-Corasick Algorithm

When the number of substrings is large (hundreds or thousands), the naive approach of checking each one individually becomes expensive. The Aho-Corasick algorithm builds a finite-state automaton from all the search patterns, then scans the text in a single pass.

python
1# pip install pyahocorasick
2import ahocorasick
3
4automaton = ahocorasick.Automaton()
5keywords = ["error", "warning", "critical", "timeout"]
6
7for idx, key in enumerate(keywords):
8    automaton.add_word(key, (idx, key))
9
10automaton.make_automaton()
11
12log_line = "critical failure: timeout after 30s"
13found = set()
14for end_index, (idx, word) in automaton.iter(log_line):
15    found.add(word)
16
17print(found)  # {'critical', 'timeout'}

This approach processes the text in O(n + m + z) time, where n is the text length, m is the total length of all patterns, and z is the number of matches. It is the standard choice for intrusion detection systems, content filters, and large-scale log analysis.

Performance Considerations

The right method depends on your scale:

  • For a small number of short substrings (under 20), the simple loop with in or includes() is perfectly fine.
  • For case-insensitive matching, convert both the text and the keywords to lowercase before comparing, rather than using regex with the re.IGNORECASE flag on each check.
  • For very large texts or many patterns, Aho-Corasick or suffix-based data structures outperform repeated linear scans.

Common Pitfalls

  1. Forgetting case sensitivity. "Fox" and "fox" are different strings. Normalize both sides before comparing.
  2. Matching substrings unintentionally. Searching for "cat" will match inside "concatenate". If you need whole-word matching, use regex word boundaries (\bcat\b) or split the text into tokens first.
  3. Modifying the keyword list during iteration. In languages like Java, modifying a collection while streaming over it throws a ConcurrentModificationException. Build your keyword list before you start checking.
  4. Ignoring Unicode normalization. Characters like accented letters can have multiple Unicode representations. Two strings that look identical on screen may fail an equality check. Use unicodedata.normalize() in Python or String.normalize() in JavaScript when working with international text.

Summary

Checking for multiple substrings is a solved problem with well-known patterns across languages. Use any()/all() in Python, some()/every() in JavaScript, or streams in Java for everyday tasks. Reach for regex when you need pattern flexibility, and consider Aho-Corasick when working at scale. The most common mistakes involve case sensitivity, unintended partial matches, and Unicode normalization, all of which are straightforward to handle once you are aware of them.


Course illustration
Course illustration

All Rights Reserved.