Python non-greedy regexes
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In Python regular expressions, quantifiers such as * and + are greedy by default, which means they match as much text as they can. Non-greedy, also called lazy, quantifiers stop as early as possible, which is often the difference between extracting one field correctly and swallowing half the string.
Greedy Versus Non-Greedy
The greedy pattern .* keeps consuming characters until the rest of the pattern can still match. To make the same quantifier lazy, add ?, turning it into .*?.
The greedy version returns one large match. The lazy version returns two smaller matches, which is usually what you want when a delimiter can appear multiple times in the same string.
The Syntax of Lazy Quantifiers
Python supports lazy forms for the common quantifiers:
- '
*?for zero or more, as little as possible' - '
+?for one or more, as little as possible' - '
??for zero or one, preferring the shorter match' - '
m,n?is not valid syntax, butm,nranges can be made lazy with?, such asr"a{2,5}?"'
Here is a quick example with quoted text:
Without the lazy quantifier, the first quote would match all the way to the last quote in the line.
Non-Greedy Does Not Mean "Safe by Itself"
Lazy matching only changes how much the engine prefers to consume. It does not remove backtracking, and it does not magically make a weak pattern precise. Good regex design still depends on choosing strong delimiters.
For example, if you want the contents of parentheses, this is better:
than this:
The character-class version is often clearer because it states what cannot appear before the closing delimiter. That usually reduces surprising matches and excessive backtracking.
Using Non-Greedy Patterns with Multiline Text
If the text can span multiple lines, remember that . does not match newlines unless you enable re.DOTALL:
This is a common source of confusion. Developers correctly write a lazy pattern, but it still fails because newline handling, not greediness, is the real issue.
When Regex Stops Being the Right Tool
Non-greedy regexes are useful for logs, delimited text, and small parsing tasks. They are a poor fit for deeply nested languages such as HTML, XML, or programming syntax. If the input has recursive structure, use a parser instead of making the pattern more clever.
That is especially true when people try to parse nested tags with .*?. Lazy matching can reduce damage, but it cannot give regex a full tree structure.
Common Pitfalls
- Thinking
.*?is always the best solution instead of using a more precise character class. - Forgetting
re.DOTALLwhen the target text spans multiple lines. - Assuming non-greedy means no backtracking or no performance risk.
- Using regex to parse nested formats that need a real parser.
- Forgetting that delimiters inside escaped strings may require more careful patterns.
Summary
- Python quantifiers are greedy by default.
- Add
?to make a quantifier lazy, such as.*?or+?. - Lazy matching helps stop at the first valid delimiter, but precise patterns are still better.
- Multiline matches often need
re.DOTALLin addition to non-greedy quantifiers. - Use a parser, not regex alone, when the input has real nested structure.

