How can I change the data type of a column in a Pandas DataFrame?
There are a few different ways to do this in Pandas. Which one to use will depend on the data types we’re converting from and to.
astype method.
to_numeric function.
convert_dtypes method.
Each of these methods is detailed in the subsections below.
astype
The first and most versatile method to use is the
astype method. When called on a Pandas DataFrame or Series, this method will attempt to cast the values within to the specified type. We can use this method to change the type of one or more columns at a time, as shown in the example below:
import pandas as pd # Create and print DataFrame df = pd.DataFrame({ 'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9'] }) print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes) # Change column A's values to floats df['A'] = df['A'].astype(float) # Change column B and C's values to integers df = df.astype({'B': int, 'C': int}) print("\nConverted:\n") # Print altered DataFrame print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes)
This script produces the following output:
A B C 0 1 4 7 1 2 5 8 2 3 6 9 A object B object C object dtype: object Converted: A B C 0 1.0 4 7 1 2.0 5 8 2 3.0 6 9 A float64 B int64 C int64 dtype: object
to_numeric
If we want to convert a column to a numeric type, we can use the
to_numeric function. Depending on the data in our columns, they will be converted into either integers or floats.
import pandas as pd # Create and print DataFrame df = pd.DataFrame({ 'A': ['1', '2', '3'], 'B': ['4.0', '5.1', '6.2'], 'C': ['7', '8', '9'] }) print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes) # Change column A's values to a numeric type df['A'] = pd.to_numeric(df['A']) # Change column B and C's values to a numeric type df[['B', 'C']] = df[['B', 'C']].apply(pd.to_numeric) print("\nConverted:\n") # Print altered DataFrame print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes)
This script produces the following output:
A B C 0 1 4.0 7 1 2 5.1 8 2 3 6.2 9 A object B object C object dtype: object Converted: A B C 0 1 4.0 7 1 2 5.1 8 2 3 6.2 9 A int64 B float64 C int64 dtype: object
Unlike with
astype, we must use the
apply method if we want to convert multiple columns at once.
convert_dtypes
The
convert_dtypes DataFrame method will convert all columns to the best possible types that support
pd.NA, the Pandas missing value. Note that this method will not convert numeric strings to integers or floats.
import pandas as pd # Create and print DataFrame df = pd.DataFrame({ 'A': [1.0, 2.0, 5.3], 'B': ['z', 'x', 'c'], 'C': [7, 8, 4], 'D': ['1', '2', '3'] }) print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes) # Change all columns to the appropriate types df = df.convert_dtypes() print("\nConverted:\n") # Print altered DataFrame print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes)
This script produces the following output:
A B C D 0 1.0 z 7 1 1 2.0 x 8 2 2 5.3 c 4 3 A float64 B object C int64 D object dtype: object Converted: A B C D 0 1.0 z 7 1 1 2.0 x 8 2 2 5.3 c 4 3 A Float64 B string C Int64 D string dtype: object
