Pandas
Data Analysis
Python
Data Manipulation
Programming

Change column type in pandas

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Pandas is an indispensable tool in the data scientist’s toolbox, designed specifically for data manipulation and analysis. For those working with data in Python, manipulating the data type of a column in a pandas DataFrame is a common task. Adjusting data types can be crucial for memory management, optimizing performance, and ensuring compatibility with various functions and mathematical operations.

Understanding DataFrame Column Types

In pandas, every column in a DataFrame has a type (sometimes called dtype in the pandas documentation). The type of a column determines what kind of data it can hold. Common pandas data types include:

  • object (usually strings)
  • int64
  • float64
  • bool
  • datetime64
  • category

Each of these has different memory and performance characteristics, which is why it's important sometimes to change the data type of a column.

How to Change Column Type in pandas

To change the data type of a column, you can use the astype() method. The astype() method is versatile and can handle a conversion to any supported pandas data type or Python type.

Basic Usage of astype()

Here is a simple example where we change the type of a column from int to float:

python
1import pandas as pd
2
3# Create a simple DataFrame
4df = pd.DataFrame({
5    'A': [1, 2, 3],
6    'B': [4, 5, 6]
7})
8
9# Convert column 'A' from int64 to float64
10df['A'] = df['A'].astype('float64')
11
12print(df)
13print(df.dtypes)

Changing to category Type

If you have a column with a limited set of values, converting it to category can save memory:

python
1df['C'] = ['frog', 'frog', 'toad']
2df['C'] = df['C'].astype('category')
3
4print(df['C'].dtype)  # will show 'category'

Converting to datetime

For columns containing dates or timestamps, converting them to datetime64 can be particularly useful as it allows the use of pandas' powerful time-series functionality:

python
1df['date'] = ['20200101', '20200201', '20200301']
2df['date'] = pd.to_datetime(df['date'])
3
4print(df['date'].dtype)  # will show 'datetime64[ns]'

Handling Errors in Type Conversion

When converting types, it's possible to encounter errors if the data cannot be converted to the desired type. For instance, trying to convert a column with non-numeric strings to int64 or float64 will raise a ValueError.

To handle such cases, astype() has a errors argument. If you set errors='ignore', pandas will return the original object if the conversion fails, and no error will be raised.

python
1df['D'] = ['1', 'two', '3']
2
3# Trying to convert 'D' to int will fail due to 'two'
4df['D'] = df['D'].astype('int', errors='ignore')
5
6print(df['D'].dtype)  # will remain 'object'

Summary Table

Here is a summary of common data types and the typical scenarios where you might want to convert them:

Original TypeConversion TypeReason/Use case
int64float64Handle NaN values which are not defined for integers
objectcategoryReduce memory usage if number of unique values is small
Any typedatetime64Utilize pandas' datetime functionalities
Any numericalobjectUseful in cases with mixed numerical and non-numerical data

Conclusion

Changing the data type of a column in pandas is straightforward using the astype() method but requires awareness of the data's characteristics and the target data type. Proper conversions can unlock more functionality, enhance performance, and ensure your data analysis is as robust as possible.


Course illustration
Course illustration

All Rights Reserved.