In Python, how do I split a string and keep the separators?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
If you want to split a string in Python and keep the separators, the usual answer is to use re.split with a capturing group. Ordinary str.split() discards delimiters, so you need a regex-based approach when the separators themselves matter.
The Standard Pattern with re.split
Python’s re.split keeps the matched delimiter if the delimiter is inside a capturing group.
Example:
Output:
That is the most direct answer for many cases.
Why the Capturing Group Matters
Without parentheses, the regex tells Python where to split but not what to preserve. With parentheses, the separator becomes part of the result list.
Compare these two:
The first version drops delimiters. The second keeps them.
Splitting on Whitespace or More Complex Delimiters
You are not limited to single-character separators. Any regex can work.
For example, split on runs of whitespace and keep them:
Or keep punctuation marks:
This makes the technique useful for tokenization, simple parsers, and formatting tools.
Avoiding Empty Strings
Depending on the pattern and the position of delimiters, re.split may produce empty strings at the beginning, end, or between adjacent separators.
Example:
If you do not want empties, filter them out:
Whether that is correct depends on your application. In some tokenizers, empty fields are meaningful.
Alternative: re.findall
Sometimes re.findall is a cleaner mental model because you explicitly match either text chunks or separators.
This can be easier to control when you want to avoid empty strings entirely.
When str.split Is Still Better
If you do not need separators, use str.split. It is simpler and faster for plain delimiter-based splitting.
The regex approach is worth it only when:
- separators need to be preserved
- delimiters are pattern-based rather than fixed text
- you are building a tokenizer or lightweight parser
Do not reach for regex when the simple method already fits.
If performance matters on very large text inputs, benchmark the exact pattern you plan to use. The cleanest regex answer is usually fast enough, but complex delimiter rules can make implementation choices matter.
Common Pitfalls
The biggest mistake is forgetting the capturing parentheses in re.split. Without them, the separators vanish.
Another issue is writing a regex that accidentally matches more than the intended separator, especially with special regex characters such as . or |.
People also often get confused by empty strings in the output. Those are a natural result of certain split patterns and need to be handled intentionally.
Finally, do not use a character class when you meant to preserve multi-character delimiters as whole tokens. Regex grouping choices change the output shape.
Summary
- Use
re.splitwith a capturing group to split and keep separators. - Example:
re.split(r'(,|;)', text). - Use
re.findallwhen you want a more tokenizer-like approach. - Expect empty strings in some cases and filter them only if that matches your logic.
- Use plain
str.splitwhen separators do not need to be preserved.

