Introduction
When you call df.groupby('column'), Pandas returns a DataFrameGroupBy object that you can iterate over with a for loop. Each iteration yields a tuple of (group_key, group_dataframe), giving you the group name and the subset of rows belonging to that group. While looping is useful for inspection and custom logic, Pandas provides vectorized alternatives like apply, transform, and agg that are faster for most analytical tasks.
Basic Grouping and Iteration
1import pandas as pd
2
3df = pd.DataFrame({
4 'product': ['A', 'B', 'A', 'B', 'A', 'C'],
5 'region': ['North', 'South', 'North', 'South', 'South', 'North'],
6 'sales': [100, 150, 200, 300, 120, 250]
7})
8
9# Iterate over groups
10for name, group in df.groupby('product'):
11 print(f"\nProduct: {name}")
12 print(group)
13 print(f"Total sales: {group['sales'].sum()}")
14
15# Product: A
16# product region sales
17# 0 A North 100
18# 2 A North 200
19# 4 A South 120
20# Total sales: 420
21#
22# Product: B
23# ...
Grouping by Multiple Columns
1for (product, region), group in df.groupby(['product', 'region']):
2 print(f"\nProduct: {product}, Region: {region}")
3 print(f" Rows: {len(group)}, Total: {group['sales'].sum()}")
4
5# Product: A, Region: North
6# Rows: 2, Total: 300
7# Product: A, Region: South
8# Rows: 1, Total: 120
9# Product: B, Region: South
10# Rows: 2, Total: 450
Accessing Specific Groups
1grouped = df.groupby('product')
2
3# Get a specific group by key
4group_a = grouped.get_group('A')
5print(group_a)
6# product region sales
7# 0 A North 100
8# 2 A North 200
9# 4 A South 120
10
11# List all group keys
12print(list(grouped.groups.keys()))
13# ['A', 'B', 'C']
14
15# Group sizes
16print(grouped.size())
17# product
18# A 3
19# B 2
20# C 1
Aggregation Without Looping
For most tasks, agg is faster than manual loops:
1# Single aggregation
2result = df.groupby('product')['sales'].sum()
3# product
4# A 420
5# B 450
6# C 250
7
8# Multiple aggregations
9result = df.groupby('product')['sales'].agg(['sum', 'mean', 'count', 'max'])
10# sum mean count max
11# product
12# A 420 140.0 3 200
13# B 450 225.0 2 300
14# C 250 250.0 1 250
15
16# Named aggregations
17result = df.groupby('product').agg(
18 total_sales=('sales', 'sum'),
19 avg_sales=('sales', 'mean'),
20 num_orders=('sales', 'count')
21)
Apply Custom Functions Per Group
1# apply() runs a function on each group DataFrame
2def top_sale(group):
3 return group.nlargest(1, 'sales')
4
5result = df.groupby('product').apply(top_sale)
6print(result)
7
8# Custom function with multiple return values
9def summarize(group):
10 return pd.Series({
11 'total': group['sales'].sum(),
12 'avg': group['sales'].mean(),
13 'range': group['sales'].max() - group['sales'].min()
14 })
15
16summary = df.groupby('product').apply(summarize)
transform returns a result with the same index as the input, useful for adding group-level calculations back to the original DataFrame:
1# Add group mean as a new column
2df['group_mean'] = df.groupby('product')['sales'].transform('mean')
3
4# Normalize sales within each group
5df['normalized'] = df.groupby('product')['sales'].transform(
6 lambda x: (x - x.mean()) / x.std()
7)
8
9# Flag above-average sales per group
10df['above_avg'] = df['sales'] > df.groupby('product')['sales'].transform('mean')
Filter Groups
1# Keep only groups where total sales > 300
2filtered = df.groupby('product').filter(lambda g: g['sales'].sum() > 300)
3print(filtered)
4# Only products A (420) and B (450) remain; C (250) is dropped
Iterating with GroupBy Attributes
1grouped = df.groupby('product')
2
3# Number of groups
4print(grouped.ngroups) # 3
5
6# Group indices (row positions per group)
7print(grouped.groups)
8# {'A': [0, 2, 4], 'B': [1, 3], 'C': [5]}
9
10# First/last row per group
11print(grouped.first())
12print(grouped.last())
Common Pitfalls
Looping when vectorized operations exist: for name, group in df.groupby(col) is convenient but slow. For aggregation, use .agg(), .sum(), .mean() instead. Reserve loops for complex logic that cannot be vectorized.
Modifying group DataFrames in place: The group DataFrame in a loop is a view or copy depending on context. Modifying it does not reliably change the original DataFrame. Use transform or apply to produce new values.
Forgetting reset_index() after aggregation: groupby().sum() produces a DataFrame with the group column as the index. Call .reset_index() to get it back as a regular column.
apply returning inconsistent shapes: If the function passed to apply returns different-shaped results for different groups, Pandas may raise errors or produce unexpected MultiIndex results. Ensure consistent return types.
Using groupby on columns with NaN: By default, groupby drops rows where the group key is NaN. Pass dropna=False to include NaN as a group key (Pandas 1.1+).
Summary
Iterate over df.groupby('col') with for name, group in grouped to get each group as a DataFrame
Use get_group('key') to access a single group directly
Prefer .agg(), .transform(), and .apply() over manual loops for performance
transform returns same-shaped output, useful for adding group-level stats back to the DataFrame
Pass dropna=False to include NaN values as group keys