pandas
DataFrame
data manipulation
Python
data analysis

How to add a new column to an existing DataFrame

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Adding a column to a pandas DataFrame is one of the most common data-manipulation tasks, but the best method depends on where the values come from. You might be assigning a constant, computing from existing columns, aligning by index, or adding the result of a function, and pandas supports each case slightly differently.

The Simplest Form: Direct Assignment

The most common way to add a column is straightforward item assignment.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "name": ["Alice", "Bob", "Charlie"],
5    "age": [25, 30, 35],
6})
7
8df["city"] = ["Toronto", "Montreal", "Vancouver"]
9print(df)

This works when the list length matches the number of rows.

You can also assign a scalar, and pandas will broadcast it to every row.

python
df["country"] = "Canada"
print(df)

That is often the cleanest solution for default values or labels.

Computing a Column from Existing Data

Many new columns are derived rather than typed in manually.

python
df["age_in_5_years"] = df["age"] + 5
print(df)

Because pandas performs vectorized operations column-wise, this is fast and readable.

A more conditional example might use where or apply.

python
df["age_group"] = df["age"].apply(lambda x: "under_30" if x < 30 else "30_plus")
print(df)

This is useful when the new column depends on per-row logic.

Using assign() for Chainable Code

If you prefer method chaining, assign() is often more readable than repeated bracket assignment.

python
1result = (
2    df
3    .assign(age_in_10_years=lambda d: d["age"] + 10)
4    .assign(is_senior=lambda d: d["age"] >= 65)
5)
6
7print(result)

assign() returns a new DataFrame, which fits nicely into transformation pipelines.

Aligning by Index from Another Series or DataFrame

Sometimes the new column comes from another object that already has an index.

python
scores = pd.Series([88, 92, 95], index=[0, 1, 2], name="score")
df["score"] = scores
print(df)

Pandas aligns by index, not only by row position. That can be very convenient, but it also means mismatched indexes can introduce missing values.

You can also use join() when adding several aligned columns from another DataFrame.

python
1extra = pd.DataFrame({
2    "city": ["Toronto", "Montreal", "Vancouver"],
3    "salary": [70000, 80000, 90000],
4}, index=[0, 1, 2])
5
6merged = df.join(extra)
7print(merged)

Row-Wise Functions with apply

If the new value depends on more than one column, row-wise logic can help.

python
1def describe_person(row):
2    return f"{row['name']} ({row['age']})"
3
4
5df["label"] = df.apply(describe_person, axis=1)
6print(df)

This is flexible, but it is usually slower than vectorized operations. Use it when you need the flexibility, not as the default for simple arithmetic.

Inserting at a Specific Position

If the physical column order matters, use insert().

python
df.insert(1, "status", ["active", "active", "inactive"])
print(df)

This places the new column at a specific index instead of appending it to the end.

That is useful when you are preparing reports or matching an expected schema.

A Good Mental Model

Choose the method based on the source of the new values:

  • constant value means direct assignment
  • vectorized expression means direct assignment or assign()
  • aligned external data means Series, DataFrame, or join()
  • custom row logic means apply(axis=1)
  • specific column order means insert()

Once you frame it that way, the API choices feel much less arbitrary.

Common Pitfalls

A common mistake is assigning a list whose length does not match the number of rows. Pandas will raise a length mismatch error.

Another issue is forgetting index alignment when assigning from a Series. If the indexes differ, pandas fills unmatched rows with NaN rather than silently using row order.

Developers also sometimes overuse apply(axis=1) for work that could be expressed as a vectorized operation. That makes the code slower and often less clear.

Finally, watch out for chained indexing in more complex pipelines. If you are working on a slice, you may run into SettingWithCopy confusion unless you are explicit about the object you are modifying.

Summary

  • Use df["new_col"] = ... for the most common column additions.
  • Scalars broadcast, and vectorized expressions are the fastest for derived columns.
  • 'assign() is convenient for chainable transformations.'
  • Alignment by index matters when adding data from another Series or DataFrame.
  • Use insert() when column position is important.

Course illustration
Course illustration

All Rights Reserved.