What do 'lazy' and 'greedy' mean in the context of regular expressions?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Regular expressions (regex) are a powerful tool for matching patterns within text. Two essential concepts in the context of regex are "lazy" and "greedy" quantifiers, which control how much text a regex pattern should attempt to consume. Understanding the difference between these two can greatly affect the performance of regex-based operations and the results they produce.
Greedy Quantifiers
Greedy quantifiers try to match as much text as possible. They essentially "consume" as much of the input as they can, consistent with the rest of the regular expression still allowing a match. The most common greedy quantifiers are:
*(match 0 or more times)+(match 1 or more times)?(match 0 or 1 time){n,}(match at least 'n' times)
Example of Greedy Matching
Consider the regex ".*" applied to the string "abcde". This regex uses .*, which is a greedy quantifier aiming to match any character (.) zero or more times (*). In this case, .* will match the entire string "abcde" because it consumes as much as possible.
In more complex strings, such as "abc <tag>def</tag> ghi", using a greedy quantifier with the regex <.*> to find tags would result in a match of <tag>def</tag> — this happens because the quantifier tries to match the longest string that starts with < and ends with >.
Lazy (Non-greedy) Quantifiers
Lazy quantifiers, in contrast, try to match as little text as possible, they are "reluctant" to consume characters. To make a quantifier lazy, a ? is added after it. The most common lazy quantifiers are:
*?(match 0 or more times, but as few as possible)+?(match 1 or more times, but as few as possible)??(match 0 or 1 time, but prefer 0){n,}?(match at least 'n' times, but as few as possible after that)
Example of Lazy Matching
Using the previous example of the string "<tag>def</tag>", applying the regex "<.*?>" with a lazy quantifier *?, results in matching just "<tag>". It stops consuming characters as soon as it fulfills the pattern — reaching the first > after <.
Use Cases and Performance Considerations
Lazy quantifiers are particularly useful when you want to avoid "over-matching." They are helpful when parsing or scraping data from nested or hierarchical structures (like XML or HTML) or when specific, minimal matches are desired within larger blocks of text.
Greedy quantifiers can lead to issues like catastrophic backtracking, especially in complex regex patterns with many possibilities for matching. In contrast, lazy quantifiers may result in a higher number of steps to validate a match, potentially impacting performance. Selecting between greedy or lazy depends on the specific requirements and the structure of the text being analyzed.
Summary Table
Here is a quick reference on the behavior of both types of quantifiers:
| Quantifier | Type | Description | Example | Against "abcde" | Matches |
* | Greedy | Match 0 or more times | a* | aaabcde | aaa |
*? | Lazy | Match 0 or more times, as few as possible | a*? | aaabcde | empty |
+ | Greedy | Match 1 or more times | a+ | aaabcde | aaa |
+? | Lazy | Match 1 or more times, as few as possible | a+? | aaabcde | a |
? | Greedy | Match 0 or 1 time | a? | aaabcde | a |
?? | Lazy | Match 0 or 1 time, prefer 0 | a?? | aaabcde | empty |
In summary, the choice between using greedy or lazy quantifiers in regular expressions depends on the specific requirement: whether you need to match as much as possible or to keep the match minimal. Understanding their behavior and effect can help in crafting more efficient and purpose-driven regex operations.

