Data Science
Python
Pandas Library
DataFrame Operations
Programming Tips

How to add a new column to an existing DataFrame

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Adding a column to a pandas DataFrame is one of the most common data-manipulation tasks in Python. The simplest form is direct assignment, but the best method depends on whether you want a fixed value, a computed expression, a specific column position, or a transformation that returns a new DataFrame instead of mutating the original one.

Core Sections

Direct assignment is the default tool

The most common way to add a column is direct assignment with bracket syntax.

python
1import pandas as pd
2
3
4df = pd.DataFrame({
5    "A": [1, 2, 3],
6    "B": [4, 5, 6],
7})
8
9df["C"] = [7, 8, 9]
10print(df)

This is concise and works for:

  • a list or array of matching length
  • a scalar value that should be broadcast to every row
  • a Series aligned by index

For many cases, this is the only method you need.

Computed columns should prefer vectorized expressions

When the new column comes from existing columns, pandas expressions are usually clearer and faster than row-wise apply.

python
1import pandas as pd
2
3
4df = pd.DataFrame({
5    "price": [10, 20, 30],
6    "quantity": [2, 1, 4],
7})
8
9df["total"] = df["price"] * df["quantity"]
10print(df)

This is vectorized, which is the pandas-friendly way to do column math. It is usually better than iterating row by row.

Use .insert() when column position matters

If you need the new column at a specific location rather than at the end, use .insert().

python
1import pandas as pd
2
3
4df = pd.DataFrame({
5    "A": [1, 2, 3],
6    "B": [4, 5, 6],
7})
8
9df.insert(1, "middle", [10, 20, 30])
10print(df)

This places the new column at index position 1. It is useful when column order matters for reports, exports, or interactive inspection.

Use .assign() when you want a new DataFrame

.assign() returns a new DataFrame instead of mutating the original one in place. That is useful when you want a more functional style or want to keep the original object untouched.

python
1import pandas as pd
2
3
4df = pd.DataFrame({
5    "A": [1, 2, 3],
6    "B": [4, 5, 6],
7})
8
9new_df = df.assign(sum_col=lambda x: x["A"] + x["B"])
10print(new_df)
11print(df)

This is especially nice in method chains where you are transforming data step by step.

Conditional columns with where or np.where

A new column often depends on a condition. In those cases, use vectorized condition handling rather than a Python loop.

python
1import numpy as np
2import pandas as pd
3
4
5df = pd.DataFrame({
6    "score": [45, 72, 88, 59],
7})
8
9df["passed"] = np.where(df["score"] >= 60, "yes", "no")
10print(df)

This is a common and efficient pattern for category flags, thresholds, and derived status columns.

Avoid apply(axis=1) unless you really need it

Many tutorials jump to apply for computed columns, but row-wise apply(axis=1) is slower and usually unnecessary for simple arithmetic or conditions.

Use it only when the column logic genuinely requires row-level Python code that is hard to express with vectorized operations.

Common Pitfalls

  • Assigning a list of the wrong length causes shape errors because pandas expects the values to align with the existing rows.
  • Reaching for apply(axis=1) too early often makes the code slower and more complex than a vectorized expression.
  • Using .assign() while expecting the original DataFrame to change leads to confusion because .assign() returns a new object.
  • Forgetting that Series assignment aligns by index can produce NaN values when indexes do not match.
  • Inserting columns by position with .insert() without checking the target index can disrupt report or export layout unexpectedly.

Summary

  • Direct assignment with df["col"] = ... is the standard way to add a column.
  • Use vectorized expressions for computed columns whenever possible.
  • Use .insert() when the placement of the new column matters.
  • Use .assign() when you want a transformed copy instead of mutating the original DataFrame.
  • Prefer vectorized conditional logic over row-wise apply for performance and clarity.

Course illustration
Course illustration

All Rights Reserved.