Add new columns to pandas dataframe based on other dataframe

Python

Pandas

DataFrame

Data Manipulation

Data Analysis

Add new columns to pandas dataframe based on other dataframe

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Adding columns to one pandas DataFrame from another is usually a join problem, not a loop problem. The best approach depends on how the two tables line up: by index, by one key column, or by several keys.

Pandas gives you several vectorized options for this, and using them correctly is much faster and safer than iterating row by row. The most common tools are merge, join, and map.

Use `merge` When You Have Key Columns

If the two DataFrames are related by a shared key such as user_id, use merge. It is the clearest option when you want to bring over one or more columns from a lookup table.

python

1import pandas as pd
2
3orders = pd.DataFrame({
4    "user_id": [1, 2, 3],
5    "order_total": [120, 80, 150],
6})
7
8users = pd.DataFrame({
9    "user_id": [1, 2, 3],
10    "region": ["NA", "EU", "APAC"],
11    "vip": [True, False, True],
12})
13
14result = orders.merge(users[["user_id", "region", "vip"]], on="user_id", how="left")
15print(result)

This adds region and vip to the orders DataFrame while preserving every row from orders because the join is a left join.

Use `map` For A Single Lookup Column

If you only need one new column and the relationship is one-to-one, map is often simpler than a full merge.

python

region_map = users.set_index("user_id")["region"]
orders["region"] = orders["user_id"].map(region_map)
print(orders)

This is concise and usually very readable. It works best when the lookup DataFrame has unique keys for the mapped column.

Use Index Alignment When The Index Already Matches

Sometimes the two DataFrames already share the same index. In that case, direct assignment or join is enough.

python

1left = pd.DataFrame({"sales": [10, 20, 30]}, index=["a", "b", "c"])
2right = pd.DataFrame({"target": [12, 18, 35]}, index=["a", "b", "c"])
3
4left["target"] = right["target"]
5left["gap"] = left["sales"] - left["target"]
6print(left)

Pandas aligns by index labels, not by physical row position. That is powerful, but it also means mismatched indexes can silently produce missing values if you are not paying attention.

Derive New Columns After The Join

Once the external columns are attached, you can create additional derived fields with vectorized expressions.

python

result["is_high_value"] = result["order_total"] >= 100
result["priority_score"] = result["order_total"] * result["vip"].astype(int)
print(result)

This is a common pattern in feature engineering and reporting workflows. First enrich the table from another DataFrame, then compute the fields that depend on both local and imported columns.

Validate Join Assumptions

Before adding columns from another DataFrame, check whether the relationship is actually one-to-one, one-to-many, or many-to-many. If the lookup table contains duplicate keys, a merge can create more rows than you expected.

Pandas lets you make that assumption explicit:

python

1result = orders.merge(
2    users[["user_id", "region"]],
3    on="user_id",
4    how="left",
5    validate="one_to_one"
6)

Using validate is a small step that can save a lot of debugging time when a supposedly unique key turns out not to be unique.

Common Pitfalls

One common mistake is using a Python loop and iterrows() for what is really a join. That is slower and harder to reason about than vectorized pandas operations. Another is forgetting that pandas aligns by index labels, which can produce unexpected missing values during direct assignment. Duplicate keys are another frequent source of bugs because a merge can suddenly multiply rows instead of only adding columns. Finally, watch your data types. A string key in one DataFrame and an integer key in the other will not match, even if the values look similar when printed.

Summary

Use merge when you need to add columns based on shared key columns.
Use map when you only need one lookup column from a one-to-one mapping.
Direct assignment works when both DataFrames already share the same index.
Derive additional columns after the join with vectorized expressions.
Validate key uniqueness and data types so the added columns match the rows you expect.

Add new columns to pandas dataframe based on other dataframe

Master System Design with Codemia

Introduction

Use merge When You Have Key Columns

Use map For A Single Lookup Column

Use Index Alignment When The Index Already Matches

Derive New Columns After The Join

Validate Join Assumptions

Common Pitfalls

Summary

Use `merge` When You Have Key Columns

Use `map` For A Single Lookup Column