groupby
pandas
data manipulation
Python
data analysis

Apply multiple functions to multiple groupby columns

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Pandas groupby().agg() lets you apply different aggregation functions to different columns in a single call. This is essential for summarizing data — computing the mean of one column, the sum of another, and the count of a third, all grouped by a key column. The .agg() method accepts a dictionary mapping column names to functions (or lists of functions), replacing the older and now deprecated agg({'col': {'name': func}}) syntax. For complex aggregations, NamedAgg tuples provide the cleanest approach.

Basic Aggregation with a Dictionary

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "department": ["Sales", "Sales", "Engineering", "Engineering", "Sales"],
5    "salary": [50000, 60000, 80000, 90000, 55000],
6    "bonus": [5000, 7000, 10000, 12000, 6000],
7    "years": [3, 5, 7, 10, 2],
8})
9
10# Apply different functions to different columns
11result = df.groupby("department").agg({
12    "salary": "mean",
13    "bonus": "sum",
14    "years": "max",
15})
16print(result)
17#                salary  bonus  years
18# department
19# Engineering   85000   22000     10
20# Sales         55000   18000      5

The dictionary maps column names to aggregation functions. Functions can be strings ("mean", "sum", "max", "min", "count", "std") or callable functions.

Multiple Functions per Column

python
1# Apply multiple functions to the same column
2result = df.groupby("department").agg({
3    "salary": ["mean", "min", "max"],
4    "bonus": ["sum", "mean"],
5})
6print(result)
7#              salary                 bonus
8#                mean    min    max     sum   mean
9# department
10# Engineering  85000  80000  90000   22000  11000
11# Sales        55000  50000  60000   18000   6000
12
13# The result has a MultiIndex for columns
14print(result.columns)
15# MultiIndex([('salary', 'mean'), ('salary', 'min'), ...])
16
17# Flatten the MultiIndex
18result.columns = ["_".join(col) for col in result.columns]
19print(result.columns)
20# ['salary_mean', 'salary_min', 'salary_max', 'bonus_sum', 'bonus_mean']

When passing a list of functions for a column, the result has a hierarchical (MultiIndex) column structure. Flatten it by joining the level names.

python
1# NamedAgg provides clean, explicit column naming
2result = df.groupby("department").agg(
3    avg_salary=pd.NamedAgg(column="salary", aggfunc="mean"),
4    total_bonus=pd.NamedAgg(column="bonus", aggfunc="sum"),
5    max_years=pd.NamedAgg(column="years", aggfunc="max"),
6    headcount=pd.NamedAgg(column="salary", aggfunc="count"),
7)
8print(result)
9#              avg_salary  total_bonus  max_years  headcount
10# department
11# Engineering      85000        22000         10          2
12# Sales            55000        18000          5          3

NamedAgg (introduced in pandas 0.25) produces flat column names directly. This avoids MultiIndex columns and makes the output immediately usable.

Custom Aggregation Functions

python
1import numpy as np
2
3def salary_range(x):
4    return x.max() - x.min()
5
6def coefficient_of_variation(x):
7    return x.std() / x.mean() if x.mean() != 0 else 0
8
9result = df.groupby("department").agg({
10    "salary": [salary_range, coefficient_of_variation, "median"],
11    "bonus": lambda x: x.quantile(0.75),
12})
13print(result)
14
15# With NamedAgg and custom functions
16result = df.groupby("department").agg(
17    salary_range=pd.NamedAgg(column="salary", aggfunc=salary_range),
18    salary_cv=pd.NamedAgg(column="salary", aggfunc=coefficient_of_variation),
19    bonus_p75=pd.NamedAgg(column="bonus", aggfunc=lambda x: x.quantile(0.75)),
20)

Custom functions receive a Series (all values in that column for the group) and must return a single scalar value.

Using apply for Complex Multi-Column Logic

python
1# When you need to compute values from multiple columns together
2def compute_stats(group):
3    return pd.Series({
4        "avg_salary": group["salary"].mean(),
5        "total_compensation": (group["salary"] + group["bonus"]).sum(),
6        "senior_count": (group["years"] >= 5).sum(),
7        "avg_bonus_pct": (group["bonus"] / group["salary"]).mean() * 100,
8    })
9
10result = df.groupby("department").apply(compute_stats)
11print(result)

Use .apply() when the aggregation depends on relationships between columns (e.g., bonus as a percentage of salary). .agg() only works on individual columns independently.

Aggregating with transform (Keep Original Shape)

python
1# transform returns results with the same index as the original DataFrame
2df["dept_avg_salary"] = df.groupby("department")["salary"].transform("mean")
3df["salary_vs_dept_avg"] = df["salary"] - df["dept_avg_salary"]
4print(df[["department", "salary", "dept_avg_salary", "salary_vs_dept_avg"]])
5#     department  salary  dept_avg_salary  salary_vs_dept_avg
6# 0       Sales   50000            55000               -5000
7# 1       Sales   60000            55000                5000
8# 2  Engineering   80000            85000               -5000
9# 3  Engineering   90000            85000                5000
10# 4       Sales   55000            55000                   0

.transform() broadcasts the group result back to the original DataFrame's shape. Use it when you need the aggregated value alongside each row.

Resetting the Index

python
1# groupby makes the group column the index by default
2result = df.groupby("department").agg({"salary": "mean"})
3print(result.index)  # Index(['Engineering', 'Sales'])
4
5# Option 1: reset_index() after aggregation
6result = df.groupby("department").agg({"salary": "mean"}).reset_index()
7
8# Option 2: as_index=False during groupby
9result = df.groupby("department", as_index=False).agg({"salary": "mean"})
10print(result)
11#     department  salary
12# 0  Engineering   85000
13# 1        Sales   55000

Common Pitfalls

  • Using the deprecated nested dict syntax: df.groupby("col").agg({"salary": {"average": "mean"}}) was removed in pandas 1.0. Use NamedAgg tuples instead: agg(average=pd.NamedAgg(column="salary", aggfunc="mean")).
  • Forgetting to flatten MultiIndex columns: When passing a list of functions per column, the result has MultiIndex columns that break downstream code expecting flat column names. Flatten with result.columns = ["_".join(col) for col in result.columns].
  • Using agg() when logic depends on multiple columns: .agg() applies functions to each column independently. If your calculation needs values from two or more columns (e.g., bonus / salary), use .apply() with a function that receives the entire group DataFrame.
  • Lambda functions getting the column name <lambda>: When using lambdas in a list of functions, the result column is named <lambda>, and multiple lambdas conflict. Use named functions or NamedAgg instead.
  • Expecting transform() to work with functions that change shape: .transform() must return a result with the same length as the input group. Functions like "sum" work (broadcast scalar), but functions that return multiple rows per group cause errors. Use .agg() or .apply() for shape-changing operations.

Summary

  • Use .agg({"column": "function"}) for simple per-column aggregations
  • Use NamedAgg tuples for clean, flat output column names (recommended)
  • Pass a list of functions per column to compute multiple aggregates, then flatten MultiIndex columns
  • Use .apply() when aggregation logic depends on multiple columns together
  • Use .transform() to broadcast group-level results back to the original row-level shape
  • Use as_index=False or .reset_index() to keep the group column as a regular column

Course illustration
Course illustration

All Rights Reserved.