words
string
iterate

How do I iterate over the words of a string?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In Python, iterating over the words of a string usually starts with split(), but that is only the simplest case. The right approach depends on what you mean by "word": whitespace-separated chunks, delimiter-separated fields, or punctuation-free tokens.

Once that definition is clear, the implementation is straightforward. Use split() for ordinary text, regular expressions when punctuation matters, and iterator-based techniques when the input is large.

Use split() for Normal Whitespace-Separated Text

For ordinary sentences, str.split() is the most direct answer because it breaks on any run of whitespace.

python
1sentence = "Hello, how are you doing today?"
2
3for word in sentence.split():
4    print(word)

This treats spaces, tabs, and newlines as separators. It also collapses repeated whitespace automatically:

python
1text = "one   two\tthree\nfour"
2
3for word in text.split():
4    print(word)

That makes split() a good default whenever the input is human-readable text and punctuation can stay attached to the surrounding word.

Split on a Known Delimiter for Structured Input

Sometimes the string is not natural language at all. If the input is a comma-separated or pipe-separated line, the "words" are really fields.

python
1data = "apple, banana, cherry"
2
3for item in data.split(","):
4    print(item.strip())

In this case, the delimiter is part of the data format, so using split(",") is clearer than applying a more general tokenizer. The extra strip() removes surrounding spaces without changing the core logic.

This is an important distinction: word iteration is not always about language. Sometimes it is just token iteration over a simple format.

Use Regular Expressions When Punctuation Should Not Count

If you want words without commas, periods, or question marks, split() is usually too crude. Regular expressions let you describe the tokens you actually want to keep.

python
1import re
2
3sentence = "Hello, how are you? I'm fine!"
4
5for match in re.finditer(r"[A-Za-z']+", sentence):
6    print(match.group())

This example keeps letters and apostrophes together, so "I'm" stays one token. That is often more useful for natural language processing than splitting on spaces and then cleaning punctuation afterward.

You can also describe the separators instead:

python
1import re
2
3sentence = "red, blue; green / yellow"
4parts = re.split(r"[,;/\\s]+", sentence)
5
6for part in parts:
7    if part:
8        print(part)

Choose finditer() when it is easier to define a valid word. Choose re.split() when it is easier to define the separators.

Iterate Lazily for Large Text

If you do not want to build a whole list of tokens up front, regular-expression iterators already give you a lazy approach.

python
1import re
2
3text = "alpha beta gamma delta"
4
5for match in re.finditer(r"\w+", text):
6    print(match.group())

This is useful in pipelines where the string is large or where you want to process tokens one at a time. If the input spans multiple lines, a nested loop is also a simple option:

python
1text = "first line\nsecond line\nthird line"
2
3for line in text.splitlines():
4    for word in line.split():
5        print(word)

That style keeps the code easy to read while matching the actual structure of the input.

Common Pitfalls

The biggest mistake is assuming split() removes punctuation. It does not. "hello," and "hello" are different results unless you strip or tokenize more carefully.

Another common issue is choosing a delimiter-specific split for input that does not follow one stable format. Mixed punctuation and whitespace usually require a more deliberate tokenizer.

It is also easy to forget that the correct definition of a word depends on the task. For one program, snake_case may be a single token. For another, numbers or apostrophes may need to be excluded.

Finally, do not make the solution more complex than the data requires. If plain whitespace splitting already matches the problem, regular expressions are unnecessary noise.

Summary

  • Use str.split() for ordinary whitespace-separated text.
  • Use delimiter-specific split() calls for structured input such as comma-separated fields.
  • Use re.finditer() or re.split() when punctuation handling matters.
  • Prefer iterator-based processing when the input is large.
  • Define what counts as a word before choosing the implementation.

Course illustration
Course illustration