Match multiline text using regular expression
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Matching multiline text with a regular expression usually fails the first time for one simple reason: by default, the dot character does not match newline characters. On top of that, many regex engines treat line anchors differently depending on whether multiline mode is enabled.
So there are really two separate questions. Do you want to match across line breaks, or do you want ^ and $ to work on every line inside a larger block of text.
MULTILINE and DOTALL Solve Different Problems
In Python's re module:
- '
re.MULTILINEchanges how^and$behave' - '
re.DOTALLmakes.match newline characters too'
That difference is the key to most multiline regex bugs.
The first pattern uses ^ to match the beginning of a line inside the string. The second pattern uses . across line breaks because re.DOTALL is enabled.
Extracting a Block Between Markers
One of the most common multiline tasks is extracting text between two markers:
The non-greedy .*? is important here. Without it, a pattern may consume far more text than intended, especially if the markers appear multiple times in the same input.
Matching Line by Line
Sometimes you do not want to span lines at all. You want to search each line inside a larger string. That is when re.MULTILINE is useful.
This works because ^ now matches the start of each line, not just the start of the entire string.
Regex Is Not Always the Best Tool
Regular expressions are good for well-structured text, but they become fragile when the input is deeply nested or context-sensitive. For example, parsing arbitrary HTML, XML, or programming languages with one multiline regex is usually the wrong approach.
In those cases, a real parser is more reliable. Regex still helps with small extraction tasks, but it should not be stretched beyond what the text format can support cleanly.
It is also worth remembering that other languages name the same ideas differently. Many regex engines use flags such as m for multiline and s for dotall, but the exact defaults and APIs vary. Always check the regex flavor you are using instead of assuming Python, JavaScript, Java, and PCRE behave identically.
Some engines also support inline flags. In Python, (?s) enables dotall behavior inside the pattern itself, which can be useful when you want the regex to carry its own mode information rather than relying on the call site.
Common Pitfalls
- Enabling
re.MULTILINEwhen the real problem is that.does not cross line breaks. - Forgetting to make
.*non-greedy when matching between repeated markers. - Using regex to parse complex structured formats that need a parser.
- Assuming flags have the same names or defaults in every language and regex engine.
Summary
- '
re.MULTILINEchanges^and$so they work on each line.' - '
re.DOTALLlets.match newline characters.' - Use non-greedy patterns such as
.*?when extracting blocks between markers. - Choose line-oriented matching and cross-line matching intentionally because they solve different problems.
- For complex nested formats, prefer a proper parser over a giant multiline regex.

