Change a column type in a DataFrame in Python Pandas

David Y.

The Problem

How can I change the data type of a column in a Pandas DataFrame?

The Solution

There are a few different ways to do this in Pandas. Which one to use will depend on the data types we’re converting from and to.

  1. If we want to convert a column from any data type to one specific data type (e.g. integer, float, string), we should use the astype method.
  2. If we want to convert a column to a sensible numeric data type (integer or float), we should use the to_numeric function.
  3. If we want Pandas to decide which data types to use for each column, we should use the convert_dtypes method.

Each of these methods is detailed in the subsections below.

1. Conversion with astype

The first and most versatile method to use is the astype method. When called on a Pandas DataFrame or Series, this method will attempt to cast the values within to the specified type. We can use this method to change the type of one or more columns at a time, as shown in the example below:

import pandas as pd # Create and print DataFrame df = pd.DataFrame({ 'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9'] }) print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes) # Change column A's values to floats df['A'] = df['A'].astype(float) # Change column B and C's values to integers df = df.astype({'B': int, 'C': int}) print("\nConverted:\n") # Print altered DataFrame print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes)

This script produces the following output:

A B C 0 1 4 7 1 2 5 8 2 3 6 9 A object B object C object dtype: object Converted: A B C 0 1.0 4 7 1 2.0 5 8 2 3.0 6 9 A float64 B int64 C int64 dtype: object

2. Conversion with to_numeric

If we want to convert a column to a numeric type, we can use the to_numeric function. Depending on the data in our columns, they will be converted into either integers or floats.

import pandas as pd # Create and print DataFrame df = pd.DataFrame({ 'A': ['1', '2', '3'], 'B': ['4.0', '5.1', '6.2'], 'C': ['7', '8', '9'] }) print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes) # Change column A's values to a numeric type df['A'] = pd.to_numeric(df['A']) # Change column B and C's values to a numeric type df[['B', 'C']] = df[['B', 'C']].apply(pd.to_numeric) print("\nConverted:\n") # Print altered DataFrame print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes)

This script produces the following output:

A B C 0 1 4.0 7 1 2 5.1 8 2 3 6.2 9 A object B object C object dtype: object Converted: A B C 0 1 4.0 7 1 2 5.1 8 2 3 6.2 9 A int64 B float64 C int64 dtype: object

Unlike with astype, we must use the apply method if we want to convert multiple columns at once.

3. Conversion with convert_dtypes

The convert_dtypes DataFrame method will convert all columns to the best possible types that support pd.NA, the Pandas missing value. Note that this method will not convert numeric strings to integers or floats.

import pandas as pd # Create and print DataFrame df = pd.DataFrame({ 'A': [1.0, 2.0, 5.3], 'B': ['z', 'x', 'c'], 'C': [7, 8, 4], 'D': ['1', '2', '3'] }) print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes) # Change all columns to the appropriate types df = df.convert_dtypes() print("\nConverted:\n") # Print altered DataFrame print(df) # Print data types of each column in DataFrame print("\n") print(df.dtypes)

This script produces the following output:

A B C D 0 1.0 z 7 1 1 2.0 x 8 2 2 5.3 c 4 3 A float64 B object C int64 D object dtype: object Converted: A B C D 0 1.0 z 7 1 1 2.0 x 8 2 2 5.3 c 4 3 A Float64 B string C Int64 D string dtype: object

Get Started With Sentry

Get actionable, code-level insights to resolve Python performance bottlenecks and errors.

  1. Create a free Sentry account

  2. Create a Python project and note your DSN

  3. Grab the Sentry Python SDK

pip install --upgrade sentry-sdk
  1. Configure your DSN
import sentry_sdk sentry_sdk.init( "https://<key>@sentry.io/<project>", # Set traces_sample_rate to 1.0 to capture 100% # of transactions for performance monitoring. # We recommend adjusting this value in production. traces_sample_rate=1.0, )

Check our documentation for the latest instructions.

Loved by over 4 million developers and more than 90,000 organizations worldwide, Sentry provides code-level observability to many of the world’s best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.

Share on Twitter
Bookmark this page
Ask a questionJoin the discussion

Related Answers

A better experience for your users. An easier life for your developers.

    TwitterGitHubDribbbleLinkedinDiscord
© 2024 • Sentry is a registered Trademark
of Functional Software, Inc.