pandas
csv
data manipulation
Python
file handling

How to add pandas data to an existing csv file?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Appending new rows from pandas to an existing CSV is easy in the happy path, but reliable appends need more than mode="a". You also need to control headers, column order, schema drift, and write ownership so the file stays readable after repeated runs.

Use to_csv in Append Mode

For a file that already exists and already has the correct header, pandas can append rows like this:

python
1import pandas as pd
2
3new_rows = pd.DataFrame([
4    {"id": 4, "name": "Dina", "score": 88},
5    {"id": 5, "name": "Evan", "score": 91},
6])
7
8new_rows.to_csv("users.csv", mode="a", index=False, header=False)

That works only if the target file already exists and the columns match the existing CSV exactly. If either assumption is wrong, you can create duplicate headers or scrambled column order.

Write Headers Only When Creating the File

A safer pattern checks whether the target file already exists. If it does not, write the header once. If it does, append rows without the header.

python
1from pathlib import Path
2import pandas as pd
3
4path = Path("users.csv")
5
6batch = pd.DataFrame([
7    {"id": 6, "name": "Farah", "score": 95},
8    {"id": 7, "name": "Gabe", "score": 84},
9])
10
11batch.to_csv(
12    path,
13    mode="a",
14    index=False,
15    header=not path.exists(),
16)

This avoids one of the most common CSV append bugs: a header row appearing in the middle of the data after a later run.

If several workers can create the file at roughly the same time, even the existence check can race. In that case, use a single writer process or a more durable storage layer than direct CSV appends.

Keep the Schema Explicit

Appending by column name is safer than trusting the incoming dataframe order. Reindex the frame to the expected schema before writing.

python
1import pandas as pd
2
3expected_cols = ["id", "name", "score"]
4
5incoming = pd.DataFrame([
6    {"name": "Hana", "id": 8, "score": 90},
7    {"name": "Ivan", "id": 9, "score": 87},
8])
9
10aligned = incoming.reindex(columns=expected_cols)
11
12if aligned.isnull().any().any():
13    raise ValueError("Append batch is missing required values")
14
15aligned.to_csv("users.csv", mode="a", index=False, header=False)

Without this step, a changed column order or missing field can silently corrupt downstream analytics. CSV is plain text. It will not protect you from schema mistakes on its own.

Wrap the Behavior in One Reusable Function

If multiple jobs append to the same file pattern, centralize the logic so every caller follows the same rules.

python
1from pathlib import Path
2import pandas as pd
3
4
5def append_csv(path: str, frame: pd.DataFrame, columns: list[str]) -> None:
6    target = Path(path)
7
8    if frame.empty:
9        return
10
11    aligned = frame.reindex(columns=columns)
12
13    if aligned.isnull().any().any():
14        raise ValueError("Input frame does not match required schema")
15
16    aligned.to_csv(
17        target,
18        mode="a",
19        index=False,
20        header=not target.exists(),
21        encoding="utf-8",
22        lineterminator="\n",
23    )
24
25
26append_csv(
27    "users.csv",
28    pd.DataFrame([{"id": 10, "name": "Jill", "score": 93}]),
29    ["id", "name", "score"],
30)

Now the rules for header handling, encoding, newline style, and schema validation live in one place instead of being reimplemented slightly differently in every script.

Know When Append Is the Wrong Operation

CSV append is good for "add new rows to the end." It is not good for:

  • updating existing rows
  • enforcing uniqueness
  • supporting many concurrent writers
  • transactional guarantees

If you need upserts, read the file into pandas, merge, and rewrite the whole file atomically through a temporary file. If you need high write concurrency, use SQLite, PostgreSQL, or another database instead of trying to make CSV act like one.

Even for append-only pipelines, a quick validation step is worth it. Count rows before and after, confirm the header appears only once, and verify the columns still match the expected schema.

Common Pitfalls

The most common mistake is appending with header=True, which inserts header rows into the middle of the data.

Another common issue is ignoring schema alignment and assuming dictionary key order or dataframe column order will always match the target file. Developers also often forget that CSV appends are not transactional. Two processes writing at once can interleave output and damage the file.

Summary

  • Use to_csv(..., mode="a") for true append-only writes.
  • Write the header only when the file is being created.
  • Reindex incoming data to a fixed schema before appending.
  • Centralize append behavior in one helper instead of duplicating it across scripts.
  • Use a database or full-file rewrite when the job needs updates, concurrency, or stronger guarantees.

Course illustration
Course illustration

All Rights Reserved.