Append column to pandas dataframe
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Adding a new column to a pandas DataFrame is one of the most common data manipulation tasks. pandas provides several ways to do this depending on whether you are assigning a scalar, a list, a Series, or computing a column from existing data. The simplest approach is direct assignment with bracket notation (df['new_col'] = values), but methods like assign(), insert(), and concat() offer more control over column placement and chaining.
Direct Assignment (Most Common)
Direct assignment with df['col'] = values is the most common method. The new column is appended at the end. If the column already exists, it is overwritten.
Using assign() for Chaining
assign() creates a copy with the new columns. It supports lambda functions that reference the DataFrame being built, making it ideal for method chaining without side effects.
Using insert() for Position Control
insert() modifies the DataFrame in place and lets you specify the exact column position with the loc parameter. It raises a ValueError if the column name already exists unless you pass allow_duplicates=True.
Using concat() for Multiple Columns
pd.concat() with axis=1 joins DataFrames side by side. This is useful when you have multiple columns to add at once from a separate DataFrame. The index must align — mismatched indices produce NaN values.
Adding a Column from a Series with Index Alignment
When you assign a Series, pandas aligns on the index. Rows in the DataFrame without a matching index in the Series get NaN. Rows in the Series not in the DataFrame are dropped.
Conditional Column with np.where
np.where(condition, true_value, false_value) is a vectorized way to create columns based on conditions. For multiple conditions, use np.select().
Using apply() for Complex Logic
apply() runs a function on each element (or row). It is flexible but slower than vectorized operations. Use it when no vectorized alternative exists.
Common Pitfalls
- SettingWithCopyWarning: Assigning a column to a DataFrame slice (
df[df['x'] > 0]['new'] = 1) triggers this warning because the slice may be a copy. Usedf.loc[df['x'] > 0, 'new'] = 1instead. - Index mismatch with Series: Assigning a Series with a different index produces
NaNfor non-matching rows. Reset the index with.reset_index(drop=True)if you want positional alignment. - Using append() for columns:
DataFrame.append()adds rows, not columns. It was deprecated in pandas 1.4 and removed in 2.0. Usepd.concat()withaxis=1for columns. - Overwriting existing columns silently: Direct assignment overwrites an existing column without warning. Check
if 'col' in df.columnsfirst if you want to avoid accidental overwrites. - Performance with apply():
apply()is a Python-level loop and can be 10-100x slower than vectorized operations. Prefernp.where(),np.select(), or arithmetic on Series for performance-sensitive code.
Summary
- Use
df['col'] = valuesfor simple column addition (appended at end) - Use
df.insert(loc, 'col', values)to control column position - Use
df.assign(col=...)for functional chaining without mutating the original - Use
pd.concat([df, new_df], axis=1)to add multiple columns from another DataFrame - Series assignment aligns on index — use
reset_index(drop=True)for positional alignment - Use
np.where()for conditional columns andapply()only when no vectorized option exists

