Apply pandas function to column to create multiple new columns?

Python

pandas

data manipulation

column transformation

data analysis

Apply pandas function to column to create multiple new columns?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Creating multiple columns from one pandas column is common in cleanup and feature engineering. The key decision is whether your transformation can use vectorized pandas operations or whether you truly need Python-level apply.

Prefer vectorized operations when possible

If the transformation is based on string splitting, date extraction, numeric parsing, or other built-in pandas operations, use those first. They are usually faster and clearer than a custom Python function.

python

1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "full_name": ["Ada Lovelace", "Grace Hopper", "Alan Turing"]
6    }
7)
8
9df[["first_name", "last_name"]] = df["full_name"].str.split(" ", n=1, expand=True)
10
11print(df)

The expand=True option is the important part here. It tells pandas to turn the split result into multiple columns directly. If a built-in method can already produce a DataFrame, that is usually the cleanest path.

Using `apply` for custom multi-value output

When the logic is custom, return a labeled Series from your function. Pandas will align the result by column name.

python

1import pandas as pd
2
3df = pd.DataFrame({"duration": ["02:15", "00:45", "10:05"]})
4
5def parse_duration(text: str) -> pd.Series:
6    minutes_text, seconds_text = text.split(":")
7    minutes = int(minutes_text)
8    seconds = int(seconds_text)
9    total_seconds = minutes * 60 + seconds
10
11    return pd.Series(
12        {
13            "minutes": minutes,
14            "seconds": seconds,
15            "total_seconds": total_seconds,
16        }
17    )
18
19df[["minutes", "seconds", "total_seconds"]] = df["duration"].apply(parse_duration)
20
21print(df)

This pattern is readable and explicit. It works well when one input value expands into a small fixed set of outputs.

Returning tuples or lists

If you prefer a lighter return type, your function can return a tuple or list, and you can build a new DataFrame from the results.

python

1import pandas as pd
2
3df = pd.DataFrame({"point": ["1,2", "5,8", "13,21"]})
4
5def split_point(text: str) -> tuple[int, int]:
6    left, right = text.split(",")
7    return int(left), int(right)
8
9df[["x", "y"]] = pd.DataFrame(df["point"].apply(split_point).tolist(), index=df.index)
10
11print(df)

This is useful when the function already returns positional values naturally. The explicit index=df.index keeps the new frame aligned with the original rows.

Performance and maintainability

Series.apply is convenient, but it runs Python code row by row. That is slower than vectorized pandas operations on large data sets. So the practical rule is:

use built-in vectorized methods first
use apply when the logic is truly custom
keep the output shape fixed and predictable

If you find yourself writing a very complex apply function, it can be better to preprocess the source data earlier or split the logic into clearer stages.

Another readable pattern is to keep the transformation next to the assignment with assign. That works well when the new columns belong to one logical cleanup step and you want the pipeline to stay chainable. The underlying idea is the same: produce a predictable, column-shaped result and merge it back deliberately rather than relying on implicit expansion.

Common Pitfalls

Reaching for apply when a vectorized method such as str.split already does the job.
Returning inconsistent output lengths from the custom function.
Forgetting to preserve index alignment when converting tuple results into a DataFrame.
Returning unlabeled values when named columns would make the transformation clearer.
Using row-wise apply on large data and then wondering why performance is poor.

Summary

Use vectorized pandas methods first when they can already produce multiple columns.
For custom logic, return a Series with named outputs or a fixed-length tuple.
Assign the resulting DataFrame back to the target columns explicitly.
Keep the output shape stable so pandas can align the data correctly.
'apply is flexible, but built-in vectorized methods are usually faster and simpler.'

Apply pandas function to column to create multiple new columns?

Master System Design with Codemia

Introduction

Prefer vectorized operations when possible

Using apply for custom multi-value output

Returning tuples or lists

Performance and maintainability

Common Pitfalls

Summary

Using `apply` for custom multi-value output