Python
pandas
data manipulation
column transformation
data analysis

Apply pandas function to column to create multiple new columns?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Creating multiple columns from one pandas column is common in cleanup and feature engineering. The key decision is whether your transformation can use vectorized pandas operations or whether you truly need Python-level apply.

Prefer vectorized operations when possible

If the transformation is based on string splitting, date extraction, numeric parsing, or other built-in pandas operations, use those first. They are usually faster and clearer than a custom Python function.

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "full_name": ["Ada Lovelace", "Grace Hopper", "Alan Turing"]
6    }
7)
8
9df[["first_name", "last_name"]] = df["full_name"].str.split(" ", n=1, expand=True)
10
11print(df)

The expand=True option is the important part here. It tells pandas to turn the split result into multiple columns directly. If a built-in method can already produce a DataFrame, that is usually the cleanest path.

Using apply for custom multi-value output

When the logic is custom, return a labeled Series from your function. Pandas will align the result by column name.

python
1import pandas as pd
2
3df = pd.DataFrame({"duration": ["02:15", "00:45", "10:05"]})
4
5def parse_duration(text: str) -> pd.Series:
6    minutes_text, seconds_text = text.split(":")
7    minutes = int(minutes_text)
8    seconds = int(seconds_text)
9    total_seconds = minutes * 60 + seconds
10
11    return pd.Series(
12        {
13            "minutes": minutes,
14            "seconds": seconds,
15            "total_seconds": total_seconds,
16        }
17    )
18
19df[["minutes", "seconds", "total_seconds"]] = df["duration"].apply(parse_duration)
20
21print(df)

This pattern is readable and explicit. It works well when one input value expands into a small fixed set of outputs.

Returning tuples or lists

If you prefer a lighter return type, your function can return a tuple or list, and you can build a new DataFrame from the results.

python
1import pandas as pd
2
3df = pd.DataFrame({"point": ["1,2", "5,8", "13,21"]})
4
5def split_point(text: str) -> tuple[int, int]:
6    left, right = text.split(",")
7    return int(left), int(right)
8
9df[["x", "y"]] = pd.DataFrame(df["point"].apply(split_point).tolist(), index=df.index)
10
11print(df)

This is useful when the function already returns positional values naturally. The explicit index=df.index keeps the new frame aligned with the original rows.

Performance and maintainability

Series.apply is convenient, but it runs Python code row by row. That is slower than vectorized pandas operations on large data sets. So the practical rule is:

  • use built-in vectorized methods first
  • use apply when the logic is truly custom
  • keep the output shape fixed and predictable

If you find yourself writing a very complex apply function, it can be better to preprocess the source data earlier or split the logic into clearer stages.

Another readable pattern is to keep the transformation next to the assignment with assign. That works well when the new columns belong to one logical cleanup step and you want the pipeline to stay chainable. The underlying idea is the same: produce a predictable, column-shaped result and merge it back deliberately rather than relying on implicit expansion.

Common Pitfalls

  • Reaching for apply when a vectorized method such as str.split already does the job.
  • Returning inconsistent output lengths from the custom function.
  • Forgetting to preserve index alignment when converting tuple results into a DataFrame.
  • Returning unlabeled values when named columns would make the transformation clearer.
  • Using row-wise apply on large data and then wondering why performance is poor.

Summary

  • Use vectorized pandas methods first when they can already produce multiple columns.
  • For custom logic, return a Series with named outputs or a fixed-length tuple.
  • Assign the resulting DataFrame back to the target columns explicitly.
  • Keep the output shape stable so pandas can align the data correctly.
  • 'apply is flexible, but built-in vectorized methods are usually faster and simpler.'

Course illustration
Course illustration

All Rights Reserved.