Pandas
Dataframe
Python
Data manipulation
Row appending

Create a Pandas Dataframe by appending one row at a time

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Creating a pandas DataFrame one row at a time is tempting because it mirrors how data often arrives in loops or streams. The problem is that pandas is designed for bulk construction and vectorized operations, so repeated row appends become slow much sooner than most people expect.

Why Row-by-Row Growth Is Slow

A DataFrame is column-oriented. Adding one row is not like appending to a Python list; pandas may need to allocate new internal arrays and copy existing data. If you repeat that work for every row, the total cost can grow roughly quadratically.

This pattern works, but it does not scale:

python
1import pandas as pd
2
3df = pd.DataFrame(columns=["name", "age"])
4
5for i in range(5):
6    df.loc[len(df)] = [f"user-{i}", 20 + i]
7
8print(df)

For five rows, nobody notices. For fifty thousand rows, it becomes the bottleneck.

Better Pattern: Collect Rows First

The usual fix is simple: accumulate row data in a regular Python list and build the DataFrame once at the end.

python
1import pandas as pd
2
3rows = []
4
5for i in range(5):
6    rows.append({
7        "name": f"user-{i}",
8        "age": 20 + i,
9    })
10
11df = pd.DataFrame(rows)
12print(df)

This keeps the loop cheap and lets pandas create the table efficiently in one batch.

What About DataFrame.append

Older answers often show df.append(...). That was always an inefficient way to grow a table in a loop, and modern pandas versions removed it. So if you are searching for "the new append," the better question is actually "how do I stop growing the DataFrame one row at a time?"

That means code like this should be treated as historical:

python
# old pattern, not recommended
# df = df.append({"name": "alice", "age": 30}, ignore_index=True)

The replacement is usually list accumulation followed by one DataFrame construction, not another row-append API.

When Data Arrives Continuously

Sometimes you do not know all rows in advance. In that case, build chunks and concatenate occasionally instead of mutating the main table for every single event.

python
1import pandas as pd
2
3chunks = []
4buffer = []
5
6for i in range(12):
7    buffer.append({"id": i, "value": i * 10})
8
9    if len(buffer) == 5:
10        chunks.append(pd.DataFrame(buffer))
11        buffer.clear()
12
13if buffer:
14    chunks.append(pd.DataFrame(buffer))
15
16df = pd.concat(chunks, ignore_index=True)
17print(df)

This reduces copying and keeps the code suitable for medium-sized streaming jobs.

Tiny Tables Are a Valid Exception

There are situations where direct row insertion is fine:

  • Interactive notebook exploration
  • Building a tiny report table
  • Debugging scripts where clarity matters more than speed

For example:

python
1import pandas as pd
2
3df = pd.DataFrame(columns=["event", "count"])
4df.loc[len(df)] = ["started", 1]
5df.loc[len(df)] = ["finished", 1]
6
7print(df)

That is completely reasonable when the table is small and the code path is not performance-sensitive.

Dtype Stability Matters Too

Bulk construction is not only faster. It also makes dtype handling more predictable. Row-by-row insertion can leave columns as generic object dtype longer than necessary, especially if early rows contain strings or missing values.

You can normalize dtypes after building the table:

python
1import pandas as pd
2
3rows = [
4    {"ts": "2026-03-11", "amount": "12.5"},
5    {"ts": "2026-03-12", "amount": "9.0"},
6]
7
8df = pd.DataFrame(rows)
9df["ts"] = pd.to_datetime(df["ts"])
10df["amount"] = pd.to_numeric(df["amount"])
11
12print(df.dtypes)

That is easier to reason about when the whole dataset is visible at once.

Common Pitfalls

The biggest mistake is assuming a pandas DataFrame grows like a Python list. It does not, and repeated row appends become expensive quickly.

Another issue is copying old examples that still use DataFrame.append. Even in environments where that method exists, it encourages the wrong construction strategy for anything beyond toy data.

Developers also forget about index handling. If you combine chunks later, use ignore_index=True unless you intentionally want the original row indices to remain.

Finally, row-by-row insertion can hide dtype problems until much later in the pipeline. Building in bulk makes it easier to inspect and clean column types immediately.

Summary

  • Repeated row-by-row DataFrame growth is usually slow because pandas copies data internally.
  • The best general pattern is to collect rows in a list and build the table once.
  • For streaming inputs, accumulate chunks and use pd.concat occasionally.
  • Treat old DataFrame.append examples as outdated guidance.
  • Bulk construction improves both performance and dtype predictability.

Course illustration
Course illustration

All Rights Reserved.