pandas
dataframe
groupby
python
tutorial

How to loop over grouped Pandas dataframe?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When you call df.groupby('column'), Pandas returns a DataFrameGroupBy object that you can iterate over with a for loop. Each iteration yields a tuple of (group_key, group_dataframe), giving you the group name and the subset of rows belonging to that group. While looping is useful for inspection and custom logic, Pandas provides vectorized alternatives like apply, transform, and agg that are faster for most analytical tasks.

Basic Grouping and Iteration

python
1import pandas as pd
2
3df = pd.DataFrame({
4    'product': ['A', 'B', 'A', 'B', 'A', 'C'],
5    'region': ['North', 'South', 'North', 'South', 'South', 'North'],
6    'sales': [100, 150, 200, 300, 120, 250]
7})
8
9# Iterate over groups
10for name, group in df.groupby('product'):
11    print(f"\nProduct: {name}")
12    print(group)
13    print(f"Total sales: {group['sales'].sum()}")
14
15# Product: A
16#   product region  sales
17# 0       A  North    100
18# 2       A  North    200
19# 4       A  South    120
20# Total sales: 420
21#
22# Product: B
23# ...

Grouping by Multiple Columns

python
1for (product, region), group in df.groupby(['product', 'region']):
2    print(f"\nProduct: {product}, Region: {region}")
3    print(f"  Rows: {len(group)}, Total: {group['sales'].sum()}")
4
5# Product: A, Region: North
6#   Rows: 2, Total: 300
7# Product: A, Region: South
8#   Rows: 1, Total: 120
9# Product: B, Region: South
10#   Rows: 2, Total: 450

Accessing Specific Groups

python
1grouped = df.groupby('product')
2
3# Get a specific group by key
4group_a = grouped.get_group('A')
5print(group_a)
6#   product region  sales
7# 0       A  North    100
8# 2       A  North    200
9# 4       A  South    120
10
11# List all group keys
12print(list(grouped.groups.keys()))
13# ['A', 'B', 'C']
14
15# Group sizes
16print(grouped.size())
17# product
18# A    3
19# B    2
20# C    1

Aggregation Without Looping

For most tasks, agg is faster than manual loops:

python
1# Single aggregation
2result = df.groupby('product')['sales'].sum()
3# product
4# A    420
5# B    450
6# C    250
7
8# Multiple aggregations
9result = df.groupby('product')['sales'].agg(['sum', 'mean', 'count', 'max'])
10#          sum   mean  count  max
11# product
12# A        420  140.0      3  200
13# B        450  225.0      2  300
14# C        250  250.0      1  250
15
16# Named aggregations
17result = df.groupby('product').agg(
18    total_sales=('sales', 'sum'),
19    avg_sales=('sales', 'mean'),
20    num_orders=('sales', 'count')
21)

Apply Custom Functions Per Group

python
1# apply() runs a function on each group DataFrame
2def top_sale(group):
3    return group.nlargest(1, 'sales')
4
5result = df.groupby('product').apply(top_sale)
6print(result)
7
8# Custom function with multiple return values
9def summarize(group):
10    return pd.Series({
11        'total': group['sales'].sum(),
12        'avg': group['sales'].mean(),
13        'range': group['sales'].max() - group['sales'].min()
14    })
15
16summary = df.groupby('product').apply(summarize)

Transform (Return Same-Shaped Result)

transform returns a result with the same index as the input, useful for adding group-level calculations back to the original DataFrame:

python
1# Add group mean as a new column
2df['group_mean'] = df.groupby('product')['sales'].transform('mean')
3
4# Normalize sales within each group
5df['normalized'] = df.groupby('product')['sales'].transform(
6    lambda x: (x - x.mean()) / x.std()
7)
8
9# Flag above-average sales per group
10df['above_avg'] = df['sales'] > df.groupby('product')['sales'].transform('mean')

Filter Groups

python
1# Keep only groups where total sales > 300
2filtered = df.groupby('product').filter(lambda g: g['sales'].sum() > 300)
3print(filtered)
4# Only products A (420) and B (450) remain; C (250) is dropped

Iterating with GroupBy Attributes

python
1grouped = df.groupby('product')
2
3# Number of groups
4print(grouped.ngroups)  # 3
5
6# Group indices (row positions per group)
7print(grouped.groups)
8# {'A': [0, 2, 4], 'B': [1, 3], 'C': [5]}
9
10# First/last row per group
11print(grouped.first())
12print(grouped.last())

Common Pitfalls

  • Looping when vectorized operations exist: for name, group in df.groupby(col) is convenient but slow. For aggregation, use .agg(), .sum(), .mean() instead. Reserve loops for complex logic that cannot be vectorized.
  • Modifying group DataFrames in place: The group DataFrame in a loop is a view or copy depending on context. Modifying it does not reliably change the original DataFrame. Use transform or apply to produce new values.
  • Forgetting reset_index() after aggregation: groupby().sum() produces a DataFrame with the group column as the index. Call .reset_index() to get it back as a regular column.
  • apply returning inconsistent shapes: If the function passed to apply returns different-shaped results for different groups, Pandas may raise errors or produce unexpected MultiIndex results. Ensure consistent return types.
  • Using groupby on columns with NaN: By default, groupby drops rows where the group key is NaN. Pass dropna=False to include NaN as a group key (Pandas 1.1+).

Summary

  • Iterate over df.groupby('col') with for name, group in grouped to get each group as a DataFrame
  • Use get_group('key') to access a single group directly
  • Prefer .agg(), .transform(), and .apply() over manual loops for performance
  • transform returns same-shaped output, useful for adding group-level stats back to the DataFrame
  • Pass dropna=False to include NaN values as group keys

Course illustration
Course illustration

All Rights Reserved.