Python
string manipulation
split function
regular expressions
programming tips

In Python, how do I split a string and keep the separators?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If you want to split a string in Python and keep the separators, the usual answer is to use re.split with a capturing group. Ordinary str.split() discards delimiters, so you need a regex-based approach when the separators themselves matter.

The Standard Pattern with re.split

Python’s re.split keeps the matched delimiter if the delimiter is inside a capturing group.

Example:

python
1import re
2
3text = "one,two;three"
4parts = re.split(r'(,|;)', text)
5print(parts)

Output:

python
['one', ',', 'two', ';', 'three']

That is the most direct answer for many cases.

Why the Capturing Group Matters

Without parentheses, the regex tells Python where to split but not what to preserve. With parentheses, the separator becomes part of the result list.

Compare these two:

python
1import re
2
3text = "a,b;c"
4print(re.split(r',|;', text))
5print(re.split(r'(,|;)', text))

The first version drops delimiters. The second keeps them.

Splitting on Whitespace or More Complex Delimiters

You are not limited to single-character separators. Any regex can work.

For example, split on runs of whitespace and keep them:

python
1import re
2
3text = "one   two\tthree"
4parts = re.split(r'(\s+)', text)
5print(parts)

Or keep punctuation marks:

python
1import re
2
3text = "Hello, world! Really?"
4parts = re.split(r'([,!?])', text)
5print(parts)

This makes the technique useful for tokenization, simple parsers, and formatting tools.

Avoiding Empty Strings

Depending on the pattern and the position of delimiters, re.split may produce empty strings at the beginning, end, or between adjacent separators.

Example:

python
1import re
2
3text = ",a,,b,"
4parts = re.split(r'(,)', text)
5print(parts)

If you do not want empties, filter them out:

python
cleaned = [p for p in parts if p != ""]
print(cleaned)

Whether that is correct depends on your application. In some tokenizers, empty fields are meaningful.

Alternative: re.findall

Sometimes re.findall is a cleaner mental model because you explicitly match either text chunks or separators.

python
1import re
2
3text = "one,two;three"
4parts = re.findall(r'[^,;]+|[,;]', text)
5print(parts)

This can be easier to control when you want to avoid empty strings entirely.

When str.split Is Still Better

If you do not need separators, use str.split. It is simpler and faster for plain delimiter-based splitting.

The regex approach is worth it only when:

  • separators need to be preserved
  • delimiters are pattern-based rather than fixed text
  • you are building a tokenizer or lightweight parser

Do not reach for regex when the simple method already fits.

If performance matters on very large text inputs, benchmark the exact pattern you plan to use. The cleanest regex answer is usually fast enough, but complex delimiter rules can make implementation choices matter.

Common Pitfalls

The biggest mistake is forgetting the capturing parentheses in re.split. Without them, the separators vanish.

Another issue is writing a regex that accidentally matches more than the intended separator, especially with special regex characters such as . or |.

People also often get confused by empty strings in the output. Those are a natural result of certain split patterns and need to be handled intentionally.

Finally, do not use a character class when you meant to preserve multi-character delimiters as whole tokens. Regex grouping choices change the output shape.

Summary

  • Use re.split with a capturing group to split and keep separators.
  • Example: re.split(r'(,|;)', text).
  • Use re.findall when you want a more tokenizer-like approach.
  • Expect empty strings in some cases and filter them only if that matches your logic.
  • Use plain str.split when separators do not need to be preserved.

Course illustration
Course illustration

All Rights Reserved.