How can I iterate over the rows of a Python Pandas DataFrame, accessing its elements by their column names?
Iterating over rows is considered bad practice in Pandas, as it becomes quite slow for large DataFrames. It is worth considering whether the problem we’re aiming to solve with iteration over rows could instead be solved using one of the following approaches:
DataFrame.apply
: this method will execute a provided function over the rows or columns of a DataFrame.If neither of these solutions is appropriate and iterating over the rows of our DataFrame cannot be avoided, we can use DataFrame.itertuples()
. This function returns rows as named tuples. For example:
import pandas prices_df = pandas.DataFrame({'cost_price': [1, 2], 'sale_price': [2, 4]}, index=['apple', 'orange']) for row in prices_df.itertuples(): print(f"An {row.Index} costs ${row.cost_price} and sells for ${row.sale_price}.")
This will print the following:
An apple costs $1 and sells for $2. An orange costs $2 and sells for $4.
DataFrame.iterrows()
can also be used to iterate over DataFrame rows, but is slower than itertuples()
and does not preserve data types.