pandas
data manipulation
dataframe
missing data
Python

Add missing dates to pandas dataframe

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When a pandas DataFrame has gaps in its date sequence, the usual fix is to create the full date range you expect and reindex the data onto it. That gives you explicit rows for missing dates, after which you can decide whether to leave values as missing, fill them, or compute them from neighboring observations.

Reindex against a complete date range

The basic pattern is to convert the date column to datetime, set it as the index, create a full range, and call reindex.

python
1import pandas as pd
2
3sales = pd.DataFrame(
4    {
5        "date": ["2024-01-01", "2024-01-03", "2024-01-06"],
6        "orders": [10, 15, 7],
7    }
8)
9
10sales["date"] = pd.to_datetime(sales["date"])
11sales = sales.set_index("date")
12
13full_index = pd.date_range(sales.index.min(), sales.index.max(), freq="D")
14completed = sales.reindex(full_index)
15print(completed)

Missing dates now exist as rows with NaN values.

Fill missing values only after deciding what the gap means

Adding dates and filling values are separate decisions. Sometimes missing dates mean zero activity. Sometimes they mean data was not collected.

python
completed["orders"] = completed["orders"].fillna(0)
print(completed)

This is appropriate for cases such as daily order counts where a missing date should be interpreted as zero orders. It is not appropriate when the missing value means unknown.

Use asfreq when the index is already a DatetimeIndex

If the DataFrame already uses a datetime index, asfreq is a concise alternative.

python
1import pandas as pd
2
3series = pd.DataFrame(
4    {"orders": [10, 15, 7]},
5    index=pd.to_datetime(["2024-01-01", "2024-01-03", "2024-01-06"]),
6)
7
8completed = series.asfreq("D")
9print(completed)

This is especially clean in time-series pipelines where the index is already set up correctly.

Add missing dates per group when multiple entities exist

Real datasets often contain several independent series, such as one store per date. In that case, fill gaps within each group rather than across the entire table.

python
1import pandas as pd
2
3sales = pd.DataFrame(
4    {
5        "store": ["A", "A", "B", "B"],
6        "date": ["2024-01-01", "2024-01-03", "2024-01-01", "2024-01-02"],
7        "orders": [10, 15, 5, 8],
8    }
9)
10
11sales["date"] = pd.to_datetime(sales["date"])
12
13frames = []
14for store, group in sales.groupby("store"):
15    group = group.set_index("date").sort_index()
16    idx = pd.date_range(group.index.min(), group.index.max(), freq="D")
17    filled = group.reindex(idx)
18    filled["store"] = store
19    frames.append(filled.reset_index(names="date"))
20
21result = pd.concat(frames, ignore_index=True)
22print(result)

That keeps each store's timeline independent.

Sort first and validate the frequency assumption

Before filling dates, sort the data and confirm the intended frequency. Daily, hourly, business-day, and monthly data each need a different frequency string. Adding calendar days to business-day data can create rows you never wanted.

A clean fill starts with an explicit decision about the expected cadence, not with trial and error.

Preserve the date as a column when needed

After reindex, the generated dates live in the index. If downstream code expects a normal column, reset the index explicitly.

python
completed = completed.reset_index(names="date")
print(completed.head())

That small step often prevents confusion when joining the result back to non-time-series tables.

Common Pitfalls

  • Filling missing dates before converting the date column to actual datetime values.
  • Treating missing dates as zero automatically when the real meaning is unknown data.
  • Using the wrong frequency such as calendar days for business-day series.
  • Reindexing the full table at once when each entity should have its own date range.
  • Forgetting to sort the index before resampling or reindexing.

Summary

  • Create a complete date_range and reindex the DataFrame onto it.
  • Leave the inserted rows as missing until you decide how gaps should be interpreted.
  • Use asfreq when the DataFrame already has a proper DatetimeIndex.
  • Reindex within groups for multi-entity time series.
  • Be explicit about the intended date frequency before filling anything.

Course illustration
Course illustration

All Rights Reserved.