pandas
dataframe
python
data manipulation
combine columns

Combine two columns of text in pandas dataframe

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Combining two text columns in a pandas DataFrame is one of the most common cleanup and feature-engineering tasks in Python data work. The right method depends on whether you need simple concatenation, safe handling of missing values, or more custom row-level formatting.

The Simplest Case: Use Vectorized String Concatenation

If both columns already contain strings or string-like values, the most direct option is to concatenate them with +.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "first_name": ["Ada", "Grace", "Linus"],
5    "last_name": ["Lovelace", "Hopper", "Torvalds"],
6})
7
8df["full_name"] = df["first_name"] + " " + df["last_name"]
9print(df)

This is easy to read and works well for clean data. The downside is that missing values can turn the result into NaN rather than a partial string.

Safer Concatenation with str.cat

For text columns in real datasets, Series.str.cat is often the better default because it gives explicit control over separators and missing values.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "city": ["Toronto", "Montreal", None],
5    "country": ["Canada", "Canada", "Canada"],
6})
7
8df["location"] = df["city"].str.cat(df["country"], sep=", ", na_rep="")
9print(df)

This method is useful because you can decide what to do when one column is missing. With na_rep, you avoid unexpected NaN propagation in the output column.

Clean Missing Values Before Combining

Sometimes the best move is to normalize the inputs first and only then combine them. That keeps the final expression simple.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "title": ["Dr.", None, "Prof."],
5    "name": ["Ng", "Smith", "Jones"],
6})
7
8left = df["title"].fillna("")
9right = df["name"].fillna("")
10
11df["display_name"] = (left + " " + right).str.strip()
12print(df)

The .str.strip() call is important here. It removes extra spaces that appear when one side is missing.

Combine More Than Two Columns

If you need to join several text columns, using agg with join is often cleaner than chaining many + operations.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "street": ["123 Main St", "200 King St"],
5    "city": ["Toronto", "Ottawa"],
6    "province": ["ON", "ON"],
7})
8
9cols = ["street", "city", "province"]
10df["address"] = df[cols].fillna("").agg(", ".join, axis=1)
11print(df)

For longer combinations, this is easier to maintain and less error-prone than repeated manual concatenation.

When apply Is the Right Tool

apply is slower than vectorized string operations, so it should not be your first choice for simple concatenation. But it becomes reasonable when the formatting rules are conditional and depend on several columns.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "first": ["Ada", "Grace"],
5    "middle": [None, "B."],
6    "last": ["Lovelace", "Hopper"],
7})
8
9
10def format_name(row):
11    parts = [row["first"], row["middle"], row["last"]]
12    parts = [p for p in parts if pd.notna(p) and p != ""]
13    return " ".join(parts)
14
15
16df["full_name"] = df.apply(format_name, axis=1)
17print(df)

Here apply is justified because the row logic is more than a fixed separator.

Performance Guidance

For large dataframes, prefer vectorized methods such as +, str.cat, or agg over apply(axis=1). apply runs Python code row by row and becomes noticeably slower when the dataset grows.

A practical rule is:

  • use + for clean simple text columns
  • use str.cat when missing values or separators matter
  • use agg for several columns
  • use apply only when row-level formatting rules truly require custom logic

Common Pitfalls

Forgetting about missing values is the most common problem. A clean-looking expression can produce unexpected NaN results or extra spaces.

Using apply for trivial concatenation is another mistake because it is slower and more verbose than vectorized alternatives.

Ignoring data types can also cause confusion. If one column is numeric, convert it intentionally before combining it with text.

Finally, if you need a human-readable output column, remember to clean separators and whitespace after filling missing data.

Summary

  • use vectorized pandas operations for combining text columns whenever possible
  • '+ is fine for simple clean data'
  • 'str.cat is better when separators and missing values need explicit control'
  • 'agg works well when joining several columns'
  • reserve apply for cases where formatting logic depends on the content of each row

Course illustration
Course illustration

All Rights Reserved.