How to add a new column to an existing DataFrame

pandas

DataFrame

data manipulation

Python

data analysis

How to add a new column to an existing DataFrame

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Adding a column to a pandas DataFrame is one of the most common data-manipulation tasks, but the best method depends on where the values come from. You might be assigning a constant, computing from existing columns, aligning by index, or adding the result of a function, and pandas supports each case slightly differently.

The Simplest Form: Direct Assignment

The most common way to add a column is straightforward item assignment.

python

1import pandas as pd
2
3df = pd.DataFrame({
4    "name": ["Alice", "Bob", "Charlie"],
5    "age": [25, 30, 35],
6})
7
8df["city"] = ["Toronto", "Montreal", "Vancouver"]
9print(df)

This works when the list length matches the number of rows.

You can also assign a scalar, and pandas will broadcast it to every row.

python

df["country"] = "Canada"
print(df)

That is often the cleanest solution for default values or labels.

Computing a Column from Existing Data

Many new columns are derived rather than typed in manually.

python

df["age_in_5_years"] = df["age"] + 5
print(df)

Because pandas performs vectorized operations column-wise, this is fast and readable.

A more conditional example might use where or apply.

python

df["age_group"] = df["age"].apply(lambda x: "under_30" if x < 30 else "30_plus")
print(df)

This is useful when the new column depends on per-row logic.

Using `assign()` for Chainable Code

If you prefer method chaining, assign() is often more readable than repeated bracket assignment.

python

1result = (
2    df
3    .assign(age_in_10_years=lambda d: d["age"] + 10)
4    .assign(is_senior=lambda d: d["age"] >= 65)
5)
6
7print(result)

assign() returns a new DataFrame, which fits nicely into transformation pipelines.

Aligning by Index from Another Series or DataFrame

Sometimes the new column comes from another object that already has an index.

python

scores = pd.Series([88, 92, 95], index=[0, 1, 2], name="score")
df["score"] = scores
print(df)

Pandas aligns by index, not only by row position. That can be very convenient, but it also means mismatched indexes can introduce missing values.

You can also use join() when adding several aligned columns from another DataFrame.

python

1extra = pd.DataFrame({
2    "city": ["Toronto", "Montreal", "Vancouver"],
3    "salary": [70000, 80000, 90000],
4}, index=[0, 1, 2])
5
6merged = df.join(extra)
7print(merged)

Row-Wise Functions with `apply`

If the new value depends on more than one column, row-wise logic can help.

python

1def describe_person(row):
2    return f"{row['name']} ({row['age']})"
3
4
5df["label"] = df.apply(describe_person, axis=1)
6print(df)

This is flexible, but it is usually slower than vectorized operations. Use it when you need the flexibility, not as the default for simple arithmetic.

Inserting at a Specific Position

If the physical column order matters, use insert().

python

df.insert(1, "status", ["active", "active", "inactive"])
print(df)

This places the new column at a specific index instead of appending it to the end.

That is useful when you are preparing reports or matching an expected schema.

A Good Mental Model

Choose the method based on the source of the new values:

constant value means direct assignment
vectorized expression means direct assignment or assign()
aligned external data means Series, DataFrame, or join()
custom row logic means apply(axis=1)
specific column order means insert()

Once you frame it that way, the API choices feel much less arbitrary.

Common Pitfalls

A common mistake is assigning a list whose length does not match the number of rows. Pandas will raise a length mismatch error.

Another issue is forgetting index alignment when assigning from a Series. If the indexes differ, pandas fills unmatched rows with NaN rather than silently using row order.

Developers also sometimes overuse apply(axis=1) for work that could be expressed as a vectorized operation. That makes the code slower and often less clear.

Finally, watch out for chained indexing in more complex pipelines. If you are working on a slice, you may run into SettingWithCopy confusion unless you are explicit about the object you are modifying.

Summary

Use df["new_col"] = ... for the most common column additions.
Scalars broadcast, and vectorized expressions are the fastest for derived columns.
'assign() is convenient for chainable transformations.'
Alignment by index matters when adding data from another Series or DataFrame.
Use insert() when column position is important.

How to add a new column to an existing DataFrame

Master System Design with Codemia

Introduction

The Simplest Form: Direct Assignment

Computing a Column from Existing Data

Using assign() for Chainable Code

Aligning by Index from Another Series or DataFrame

Row-Wise Functions with apply

Inserting at a Specific Position

A Good Mental Model

Common Pitfalls

Summary

Using `assign()` for Chainable Code

Row-Wise Functions with `apply`