Concatenate a list of pandas dataframes together
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Combining multiple DataFrames into one is one of the most frequent operations in data analysis with pandas. Whether you are merging monthly reports, stacking results from parallel computations, or assembling data loaded from multiple CSV files, pd.concat() is the tool you reach for. Understanding its parameters -- especially axis, join, and ignore_index -- will help you avoid subtle bugs with misaligned indices and missing columns.
Basic Concatenation with pd.concat
The simplest case is stacking DataFrames that share the same columns vertically (row-wise):
Output:
Notice that the original indices (0, 1, 0, 1, 0) are preserved. This is often not what you want.
Resetting the Index with ignore_index
When stacking rows, duplicate index values are confusing and can cause bugs in downstream code that assumes unique indices. Use ignore_index=True to generate a fresh sequential index:
Output:
This is the most common pattern and should be your default when concatenating rows.
Row-wise vs Column-wise Concatenation
The axis parameter controls the direction. The default axis=0 stacks rows. Setting axis=1 concatenates columns side by side:
Output:
Column-wise concatenation aligns on the row index, so both DataFrames must share the same index values for the result to make sense.
Handling Different Columns with join
When DataFrames have different columns, pd.concat uses an outer join by default, filling missing values with NaN:
Output:
If you only want the columns that exist in every DataFrame, use join='inner':
Output:
Choose inner when you need a clean result with no missing values, and outer when you want to preserve all data even if some columns are absent.
Using the keys Parameter for Hierarchical Indexing
The keys parameter creates a MultiIndex that identifies which original DataFrame each row came from. This is useful when you need to trace data back to its source:
Output:
You can then select a specific group with result.loc["January"].
Performance: Concat Once vs Append in a Loop
A critical performance lesson: never grow a DataFrame row by row inside a loop. Each append creates a full copy of the data, making the operation O(n^2) overall.
The correct pattern is to collect all DataFrames into a list first and then call pd.concat once. This runs in linear time because pandas allocates the output array once and copies each input DataFrame into it.
Comparison with merge and join
While pd.concat stacks DataFrames along an axis, pd.merge and DataFrame.join combine DataFrames based on shared column values or indices, similar to SQL joins:
Use pd.concat when you are combining DataFrames that represent the same kind of data (same columns, different rows). Use merge or join when you are combining DataFrames that represent related but different data and need to match rows by a key.
Common Pitfalls
- Appending in a loop instead of collecting and concatenating once makes your code orders of magnitude slower on large datasets.
- Forgetting
ignore_index=Trueleaves duplicate index values that cause confusing results with.locand.iloc. - Assuming column order is preserved across DataFrames with different column sets. Always check
.columnsafter concatenation. - Concatenating DataFrames with mismatched dtypes in the same column silently upcasts (int to float when NaN is introduced). Inspect dtypes with
result.dtypesafter concatenation. - Using
axis=1with misaligned indices produces a DataFrame full of NaN values because pandas aligns on the index, not on position.
Summary
- Use
pd.concat(list_of_dfs, ignore_index=True)as the standard pattern for stacking DataFrames row-wise. - Set
axis=1for column-wise concatenation, and ensure indices align. - Control missing-column behavior with
join='inner'orjoin='outer'(default). - Use the
keysparameter to create a MultiIndex that tracks which source DataFrame each row came from. - Always collect DataFrames into a list and call
pd.concatonce instead of appending inside a loop. - Use
pd.mergeorDataFrame.joinwhen you need to combine DataFrames by matching on key columns rather than stacking.

