Python
regex
non-greedy
programming
regular expressions

Python non-greedy regexes

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In Python regular expressions, quantifiers such as * and + are greedy by default, which means they match as much text as they can. Non-greedy, also called lazy, quantifiers stop as early as possible, which is often the difference between extracting one field correctly and swallowing half the string.

Greedy Versus Non-Greedy

The greedy pattern .* keeps consuming characters until the rest of the pattern can still match. To make the same quantifier lazy, add ?, turning it into .*?.

python
1import re
2
3text = "<b>first</b><b>second</b>"
4
5greedy = re.findall(r"<b>.*</b>", text)
6lazy = re.findall(r"<b>.*?</b>", text)
7
8print(greedy)
9print(lazy)

The greedy version returns one large match. The lazy version returns two smaller matches, which is usually what you want when a delimiter can appear multiple times in the same string.

The Syntax of Lazy Quantifiers

Python supports lazy forms for the common quantifiers:

  • '*? for zero or more, as little as possible'
  • '+? for one or more, as little as possible'
  • '?? for zero or one, preferring the shorter match'
  • 'm,n? is not valid syntax, but m,n ranges can be made lazy with ?, such as r"a{2,5}?"'

Here is a quick example with quoted text:

python
1import re
2
3text = 'name="alice" role="admin"'
4matches = re.findall(r'".*?"', text)
5print(matches)

Without the lazy quantifier, the first quote would match all the way to the last quote in the line.

Non-Greedy Does Not Mean "Safe by Itself"

Lazy matching only changes how much the engine prefers to consume. It does not remove backtracking, and it does not magically make a weak pattern precise. Good regex design still depends on choosing strong delimiters.

For example, if you want the contents of parentheses, this is better:

python
1import re
2
3text = "call(alpha) and call(beta)"
4matches = re.findall(r"\(([^)]*)\)", text)
5print(matches)

than this:

python
re.findall(r"\(.*?\)", text)

The character-class version is often clearer because it states what cannot appear before the closing delimiter. That usually reduces surprising matches and excessive backtracking.

Using Non-Greedy Patterns with Multiline Text

If the text can span multiple lines, remember that . does not match newlines unless you enable re.DOTALL:

python
1import re
2
3text = "BEGIN\nline one\nline two\nEND"
4match = re.search(r"BEGIN(.*?)END", text, re.DOTALL)
5print(match.group(1).strip())

This is a common source of confusion. Developers correctly write a lazy pattern, but it still fails because newline handling, not greediness, is the real issue.

When Regex Stops Being the Right Tool

Non-greedy regexes are useful for logs, delimited text, and small parsing tasks. They are a poor fit for deeply nested languages such as HTML, XML, or programming syntax. If the input has recursive structure, use a parser instead of making the pattern more clever.

That is especially true when people try to parse nested tags with .*?. Lazy matching can reduce damage, but it cannot give regex a full tree structure.

Common Pitfalls

  • Thinking .*? is always the best solution instead of using a more precise character class.
  • Forgetting re.DOTALL when the target text spans multiple lines.
  • Assuming non-greedy means no backtracking or no performance risk.
  • Using regex to parse nested formats that need a real parser.
  • Forgetting that delimiters inside escaped strings may require more careful patterns.

Summary

  • Python quantifiers are greedy by default.
  • Add ? to make a quantifier lazy, such as .*? or +?.
  • Lazy matching helps stop at the first valid delimiter, but precise patterns are still better.
  • Multiline matches often need re.DOTALL in addition to non-greedy quantifiers.
  • Use a parser, not regex alone, when the input has real nested structure.

Course illustration
Course illustration

All Rights Reserved.