pandas
dataframe
python
data manipulation
data analysis

Append column to pandas dataframe

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Adding a new column to a pandas DataFrame is one of the most common data manipulation tasks. pandas provides several ways to do this depending on whether you are assigning a scalar, a list, a Series, or computing a column from existing data. The simplest approach is direct assignment with bracket notation (df['new_col'] = values), but methods like assign(), insert(), and concat() offer more control over column placement and chaining.

Direct Assignment (Most Common)

python
1import pandas as pd
2
3df = pd.DataFrame({'name': ['Alice', 'Bob', 'Carol'], 'age': [30, 25, 35]})
4
5# Add a column with a list
6df['city'] = ['NYC', 'LA', 'Chicago']
7
8# Add a column with a scalar (broadcasts to all rows)
9df['country'] = 'US'
10
11# Add a column computed from existing columns
12df['birth_year'] = 2025 - df['age']
13
14print(df)
15#     name  age     city country  birth_year
16# 0  Alice   30      NYC      US        1995
17# 1    Bob   25       LA      US        2000
18# 2  Carol   35  Chicago      US        1990

Direct assignment with df['col'] = values is the most common method. The new column is appended at the end. If the column already exists, it is overwritten.

Using assign() for Chaining

python
1df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
2
3# assign() returns a new DataFrame — does not modify the original
4result = df.assign(
5    z=lambda d: d['x'] + d['y'],
6    ratio=lambda d: d['x'] / d['y']
7)
8
9print(result)
10#    x  y  z     ratio
11# 0  1  4  5  0.250000
12# 1  2  5  7  0.400000
13# 2  3  6  9  0.500000
14
15# Original is unchanged
16print(df.columns.tolist())  # ['x', 'y']

assign() creates a copy with the new columns. It supports lambda functions that reference the DataFrame being built, making it ideal for method chaining without side effects.

Using insert() for Position Control

python
1df = pd.DataFrame({'first': ['Alice', 'Bob'], 'last': ['Smith', 'Jones']})
2
3# Insert 'middle' at position 1 (between first and last)
4df.insert(loc=1, column='middle', value=['M', 'R'])
5
6print(df)
7#    first middle   last
8# 0  Alice      M  Smith
9# 1    Bob      R  Jones

insert() modifies the DataFrame in place and lets you specify the exact column position with the loc parameter. It raises a ValueError if the column name already exists unless you pass allow_duplicates=True.

Using concat() for Multiple Columns

python
1df = pd.DataFrame({'a': [1, 2, 3]})
2
3new_cols = pd.DataFrame({
4    'b': [4, 5, 6],
5    'c': [7, 8, 9]
6})
7
8result = pd.concat([df, new_cols], axis=1)
9print(result)
10#    a  b  c
11# 0  1  4  7
12# 1  2  5  8
13# 2  3  6  9

pd.concat() with axis=1 joins DataFrames side by side. This is useful when you have multiple columns to add at once from a separate DataFrame. The index must align — mismatched indices produce NaN values.

Adding a Column from a Series with Index Alignment

python
1df = pd.DataFrame({'val': [10, 20, 30]}, index=['a', 'b', 'c'])
2s = pd.Series([100, 200, 300], index=['b', 'c', 'd'])
3
4df['new'] = s
5print(df)
6#    val    new
7# a   10    NaN
8# b   20  100.0
9# c   30  200.0

When you assign a Series, pandas aligns on the index. Rows in the DataFrame without a matching index in the Series get NaN. Rows in the Series not in the DataFrame are dropped.

Conditional Column with np.where

python
1import numpy as np
2
3df = pd.DataFrame({'score': [85, 42, 73, 91, 55]})
4
5df['passed'] = np.where(df['score'] >= 60, 'Yes', 'No')
6print(df)
7#    score passed
8# 0     85    Yes
9# 1     42     No
10# 2     73    Yes
11# 3     91    Yes
12# 4     55     No

np.where(condition, true_value, false_value) is a vectorized way to create columns based on conditions. For multiple conditions, use np.select().

Using apply() for Complex Logic

python
1df = pd.DataFrame({'text': ['hello world', 'foo bar baz', 'hi']})
2
3df['word_count'] = df['text'].apply(lambda x: len(x.split()))
4print(df)
5#           text  word_count
6# 0  hello world           2
7# 1  foo bar baz           3
8# 2           hi           1

apply() runs a function on each element (or row). It is flexible but slower than vectorized operations. Use it when no vectorized alternative exists.

Common Pitfalls

  • SettingWithCopyWarning: Assigning a column to a DataFrame slice (df[df['x'] > 0]['new'] = 1) triggers this warning because the slice may be a copy. Use df.loc[df['x'] > 0, 'new'] = 1 instead.
  • Index mismatch with Series: Assigning a Series with a different index produces NaN for non-matching rows. Reset the index with .reset_index(drop=True) if you want positional alignment.
  • Using append() for columns: DataFrame.append() adds rows, not columns. It was deprecated in pandas 1.4 and removed in 2.0. Use pd.concat() with axis=1 for columns.
  • Overwriting existing columns silently: Direct assignment overwrites an existing column without warning. Check if 'col' in df.columns first if you want to avoid accidental overwrites.
  • Performance with apply(): apply() is a Python-level loop and can be 10-100x slower than vectorized operations. Prefer np.where(), np.select(), or arithmetic on Series for performance-sensitive code.

Summary

  • Use df['col'] = values for simple column addition (appended at end)
  • Use df.insert(loc, 'col', values) to control column position
  • Use df.assign(col=...) for functional chaining without mutating the original
  • Use pd.concat([df, new_df], axis=1) to add multiple columns from another DataFrame
  • Series assignment aligns on index — use reset_index(drop=True) for positional alignment
  • Use np.where() for conditional columns and apply() only when no vectorized option exists

Course illustration
Course illustration

All Rights Reserved.