Pandas
Dataframe
Data Manipulation
Python
Text Processing

Combine two columns of text in pandas dataframe

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Combining two text columns in pandas is usually just string concatenation, but the details matter when missing values or non-string data are involved. The cleanest solution depends on whether you want a quick join, explicit separators, or robust handling of NaN.

Use str.cat() for the Most Control

str.cat() is a good default because it supports separators and explicit missing-value handling.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "first_name": ["John", "Jane", "Alice", "Bob"],
5    "last_name": ["Doe", "Doe", "Cooper", "Smith"],
6})
7
8df["full_name"] = df["first_name"].str.cat(df["last_name"], sep=" ")
9print(df)

This reads clearly and avoids manually inserting separators into the expression.

Use + for Simple Cases

If both columns are already strings and you know there are no missing values, the + operator is concise:

python
df["full_name"] = df["first_name"] + " " + df["last_name"]

That is fine for clean data, but it becomes fragile when NaN values appear.

For example:

python
1df = pd.DataFrame({
2    "first_name": ["John", None],
3    "last_name": ["Doe", "Smith"],
4})
5
6df["full_name"] = df["first_name"].fillna("").str.cat(
7    df["last_name"].fillna(""),
8    sep=" "
9).str.strip()

Using fillna("") and str.strip() helps produce readable output even when some pieces are missing.

Convert Non-String Columns Explicitly

If one column is numeric or mixed type, convert it intentionally rather than relying on implicit behavior:

python
1df = pd.DataFrame({
2    "city": ["Toronto", "Paris"],
3    "zip_code": [10001, 75001],
4})
5
6df["label"] = df["city"] + " " + df["zip_code"].astype(str)
7print(df)

This avoids type errors and makes the conversion policy visible in the code.

Decide How Missing Values Should Look

The best result for missing values depends on the use case. For a display label, you may want to suppress extra spaces:

python
1df["full_name"] = (
2    df["first_name"].fillna("")
3    .str.cat(df["last_name"].fillna(""), sep=" ")
4    .str.replace(r"\s+", " ", regex=True)
5    .str.strip()
6)

That produces cleaner user-facing text than a raw concatenation that leaves doubled spaces or missing results.

Think About Performance and Readability Together

For most everyday data work, the difference between + and str.cat() is less important than writing something the next person will understand quickly. If the transformation is part of a reusable pipeline, the more explicit version is often the better choice even if the short version also works.

That is especially true when you later add a separator rule, cleanup step, or null-handling policy. Starting with a clearer expression makes those changes easier.

If you need to combine more than two columns, the same idea scales:

python
1df["label"] = (
2    df["city"].fillna("")
3    .str.cat(df["state"].fillna(""), sep=", ")
4    .str.cat(df["country"].fillna(""), sep=" / ")
5    .str.replace(r"\s+", " ", regex=True)
6    .str.strip(" ,/")
7)

That is often clearer than building a long chain of + operations when the formatting rules become more specific.

Common Pitfalls

The biggest mistake is forgetting that NaN can break a simple string concatenation. What looks fine on sample data may fail or produce missing output on real data.

Another common issue is leaving extra separators behind when one column is empty. That is why fillna("") and str.strip() are often worth adding in data-cleaning code.

People also forget about dtype consistency. If a column contains numbers, dates, or mixed objects, convert it explicitly before combining it with text.

Finally, if this new combined column is for search, display, or export, decide whether whitespace normalization should happen at the same time. Small cleanup steps here can save repeated fixes later.

Summary

  • Use str.cat() when you want clear separators and better control over missing values.
  • Use + only for simple cases where the inputs are already clean strings.
  • Handle NaN explicitly if you do not want the combined result to become missing.
  • Convert non-string columns with astype(str) when needed.
  • Clean up separators and whitespace if the combined column will be user-facing.

Course illustration
Course illustration

All Rights Reserved.