Pandas
DataFrame
DateTime
Python
Data Manipulation

Convert Pandas Column to DateTime

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Converting a pandas column to datetime unlocks sorting, filtering, resampling, and time-based grouping that are awkward or unreliable with plain strings. The standard tool is pd.to_datetime, but the right parameters matter when your input contains custom formats, invalid values, or timezone information. A careful conversion step saves a lot of downstream debugging.

Basic conversion with pd.to_datetime

The most common case is a column of date strings that pandas can infer automatically.

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "created_at": ["2025-01-01", "2025-01-02", "2025-01-03"]
6    }
7)
8
9df["created_at"] = pd.to_datetime(df["created_at"])
10
11print(df.dtypes)
12print(df)

After conversion, the column becomes datetime64[ns], which enables date-aware operations.

Specify a format when the strings are predictable

If the input has a known format, pass it explicitly. This improves clarity and can also improve parsing performance.

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "created_at": ["01/09/2025 14:30", "02/09/2025 09:15"]
6    }
7)
8
9df["created_at"] = pd.to_datetime(
10    df["created_at"],
11    format="%d/%m/%Y %H:%M"
12)
13
14print(df)

This is especially useful for day-first data, timestamps with times, or logs that follow a fixed template.

Handle bad values with errors

Real-world data often contains blanks, malformed strings, or impossible dates. The errors parameter controls what happens.

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "created_at": ["2025-01-01", "not-a-date", "2025-02-30"]
6    }
7)
8
9df["created_at"] = pd.to_datetime(
10    df["created_at"],
11    errors="coerce"
12)
13
14print(df)

With errors="coerce", invalid values become NaT, which is pandas' datetime missing value marker. This is often the most practical option because it lets the pipeline continue while still making bad rows visible.

Timezone-aware conversion

If the source data is in UTC or another known timezone, it is often better to be explicit during conversion.

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "event_time": ["2025-01-01T12:00:00Z", "2025-01-01T13:30:00Z"]
6    }
7)
8
9df["event_time"] = pd.to_datetime(df["event_time"], utc=True)
10
11print(df["event_time"])

This produces timezone-aware timestamps, which helps avoid subtle bugs later when comparing or converting times across systems.

Convert numeric epochs when the source is not a string

Not every datetime column starts as text. Some datasets store Unix timestamps as integers or floats, in seconds or milliseconds.

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "created_at": [1735689600, 1735776000]
6    }
7)
8
9df["created_at"] = pd.to_datetime(df["created_at"], unit="s", utc=True)
10print(df)

The unit argument matters. If the values are milliseconds and you parse them as seconds, the resulting dates will be wildly wrong.

Why assignment matters

pd.to_datetime returns a converted result. It does not silently change the original column unless you assign it back.

python
df["created_at"] = pd.to_datetime(df["created_at"])

This detail is easy to miss when experimenting interactively. If you forget the assignment, the DataFrame keeps the old string column.

Common Pitfalls

The most common mistake is relying on automatic inference for ambiguous date strings such as 01/02/2025. Depending on the data, that might mean January 2 or February 1. Use an explicit format when ambiguity exists.

Another issue is forgetting errors="coerce" when the source is messy. A single bad row can otherwise raise an exception and stop the whole conversion.

Timezones are another trap. Mixing naive timestamps and timezone-aware timestamps can create confusing comparison or serialization errors later.

Numeric epoch columns are another source of silent mistakes. Always match the unit argument to the real data, especially when values could be seconds, milliseconds, or microseconds.

Finally, do not assume display equals storage. A column may print in a readable date format while still being a string if the conversion result was never assigned back.

Summary

  • Use pd.to_datetime to convert string columns into real pandas datetime values.
  • Pass format when the input pattern is known and fixed.
  • Use errors="coerce" to turn invalid values into NaT.
  • Be explicit about UTC or other timezones when the data has timezone meaning.
  • Always assign the conversion result back to the DataFrame column.

Course illustration
Course illustration

All Rights Reserved.