Change a column type in a DataFrame in Python Pandas
The Problem
How can I change the data type of a column in a Pandas DataFrame?
The Solution
There are a few different ways to do this in Pandas. Which one to use will depend on the data types we’re converting from and to.
- If we want to convert a column from any data type to one specific data type (e.g. integer, float, string), we should use the
astypemethod. - If we want to convert a column to a sensible numeric data type (integer or float), we should use the
to_numericfunction. - If we want Pandas to decide which data types to use for each column, we should use the
convert_dtypesmethod.
Each of these methods is detailed in the subsections below.
1. Conversion with astype
The first and most versatile method to use is the astype method. When called on a Pandas DataFrame or Series, this method will attempt to cast the values within to the specified type. We can use this method to change the type of one or more columns at a time, as shown in the example below:
import pandas as pd
# Create and print DataFrame
df = pd.DataFrame({
'A': ['1', '2', '3'],
'B': ['4', '5', '6'],
'C': ['7', '8', '9']
})
print(df)
# Print data types of each column in DataFrame
print("\n")
print(df.dtypes)
# Change column A's values to floats
df['A'] = df['A'].astype(float)
# Change column B and C's values to integers
df = df.astype({'B': int, 'C': int})
print("\nConverted:\n")
# Print altered DataFrame
print(df)
# Print data types of each column in DataFrame
print("\n")
print(df.dtypes)
This script produces the following output:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
A object
B object
C object
dtype: object
Converted:
A B C
0 1.0 4 7
1 2.0 5 8
2 3.0 6 9
A float64
B int64
C int64
dtype: object
2. Conversion with to_numeric
If we want to convert a column to a numeric type, we can use the to_numeric function. Depending on the data in our columns, they will be converted into either integers or floats.
import pandas as pd
# Create and print DataFrame
df = pd.DataFrame({
'A': ['1', '2', '3'],
'B': ['4.0', '5.1', '6.2'],
'C': ['7', '8', '9']
})
print(df)
# Print data types of each column in DataFrame
print("\n")
print(df.dtypes)
# Change column A's values to a numeric type
df['A'] = pd.to_numeric(df['A'])
# Change column B and C's values to a numeric type
df[['B', 'C']] = df[['B', 'C']].apply(pd.to_numeric)
print("\nConverted:\n")
# Print altered DataFrame
print(df)
# Print data types of each column in DataFrame
print("\n")
print(df.dtypes)
This script produces the following output:
A B C
0 1 4.0 7
1 2 5.1 8
2 3 6.2 9
A object
B object
C object
dtype: object
Converted:
A B C
0 1 4.0 7
1 2 5.1 8
2 3 6.2 9
A int64
B float64
C int64
dtype: object
Unlike with astype, we must use the apply method if we want to convert multiple columns at once.
3. Conversion with convert_dtypes
The convert_dtypes DataFrame method will convert all columns to the best possible types that support pd.NA, the Pandas missing value. Note that this method will not convert numeric strings to integers or floats.
import pandas as pd
# Create and print DataFrame
df = pd.DataFrame({
'A': [1.0, 2.0, 5.3],
'B': ['z', 'x', 'c'],
'C': [7, 8, 4],
'D': ['1', '2', '3']
})
print(df)
# Print data types of each column in DataFrame
print("\n")
print(df.dtypes)
# Change all columns to the appropriate types
df = df.convert_dtypes()
print("\nConverted:\n")
# Print altered DataFrame
print(df)
# Print data types of each column in DataFrame
print("\n")
print(df.dtypes)
This script produces the following output:
A B C D
0 1.0 z 7 1
1 2.0 x 8 2
2 5.3 c 4 3
A float64
B object
C int64
D object
dtype: object
Converted:
A B C D
0 1.0 z 7 1
1 2.0 x 8 2
2 5.3 c 4 3
A Float64
B string
C Int64
D string
dtype: object
Considered "not bad" by 4 million developers and more than 150,000 organizations worldwide, Sentry provides code-level observability to many of the world's best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.