Combine two columns of text in pandas dataframe
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Combining two text columns in pandas is usually just string concatenation, but the details matter when missing values or non-string data are involved. The cleanest solution depends on whether you want a quick join, explicit separators, or robust handling of NaN.
Use str.cat() for the Most Control
str.cat() is a good default because it supports separators and explicit missing-value handling.
This reads clearly and avoids manually inserting separators into the expression.
Use + for Simple Cases
If both columns are already strings and you know there are no missing values, the + operator is concise:
That is fine for clean data, but it becomes fragile when NaN values appear.
For example:
Using fillna("") and str.strip() helps produce readable output even when some pieces are missing.
Convert Non-String Columns Explicitly
If one column is numeric or mixed type, convert it intentionally rather than relying on implicit behavior:
This avoids type errors and makes the conversion policy visible in the code.
Decide How Missing Values Should Look
The best result for missing values depends on the use case. For a display label, you may want to suppress extra spaces:
That produces cleaner user-facing text than a raw concatenation that leaves doubled spaces or missing results.
Think About Performance and Readability Together
For most everyday data work, the difference between + and str.cat() is less important than writing something the next person will understand quickly. If the transformation is part of a reusable pipeline, the more explicit version is often the better choice even if the short version also works.
That is especially true when you later add a separator rule, cleanup step, or null-handling policy. Starting with a clearer expression makes those changes easier.
If you need to combine more than two columns, the same idea scales:
That is often clearer than building a long chain of + operations when the formatting rules become more specific.
Common Pitfalls
The biggest mistake is forgetting that NaN can break a simple string concatenation. What looks fine on sample data may fail or produce missing output on real data.
Another common issue is leaving extra separators behind when one column is empty. That is why fillna("") and str.strip() are often worth adding in data-cleaning code.
People also forget about dtype consistency. If a column contains numbers, dates, or mixed objects, convert it explicitly before combining it with text.
Finally, if this new combined column is for search, display, or export, decide whether whitespace normalization should happen at the same time. Small cleanup steps here can save repeated fixes later.
Summary
- Use
str.cat()when you want clear separators and better control over missing values. - Use
+only for simple cases where the inputs are already clean strings. - Handle
NaNexplicitly if you do not want the combined result to become missing. - Convert non-string columns with
astype(str)when needed. - Clean up separators and whitespace if the combined column will be user-facing.

