Apply multiple functions to multiple groupby columns
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Pandas groupby().agg() lets you apply different aggregation functions to different columns in a single call. This is essential for summarizing data — computing the mean of one column, the sum of another, and the count of a third, all grouped by a key column. The .agg() method accepts a dictionary mapping column names to functions (or lists of functions), replacing the older and now deprecated agg({'col': {'name': func}}) syntax. For complex aggregations, NamedAgg tuples provide the cleanest approach.
Basic Aggregation with a Dictionary
The dictionary maps column names to aggregation functions. Functions can be strings ("mean", "sum", "max", "min", "count", "std") or callable functions.
Multiple Functions per Column
When passing a list of functions for a column, the result has a hierarchical (MultiIndex) column structure. Flatten it by joining the level names.
Named Aggregations (Recommended)
NamedAgg (introduced in pandas 0.25) produces flat column names directly. This avoids MultiIndex columns and makes the output immediately usable.
Custom Aggregation Functions
Custom functions receive a Series (all values in that column for the group) and must return a single scalar value.
Using apply for Complex Multi-Column Logic
Use .apply() when the aggregation depends on relationships between columns (e.g., bonus as a percentage of salary). .agg() only works on individual columns independently.
Aggregating with transform (Keep Original Shape)
.transform() broadcasts the group result back to the original DataFrame's shape. Use it when you need the aggregated value alongside each row.
Resetting the Index
Common Pitfalls
- Using the deprecated nested dict syntax:
df.groupby("col").agg({"salary": {"average": "mean"}})was removed in pandas 1.0. UseNamedAggtuples instead:agg(average=pd.NamedAgg(column="salary", aggfunc="mean")). - Forgetting to flatten MultiIndex columns: When passing a list of functions per column, the result has MultiIndex columns that break downstream code expecting flat column names. Flatten with
result.columns = ["_".join(col) for col in result.columns]. - Using
agg()when logic depends on multiple columns:.agg()applies functions to each column independently. If your calculation needs values from two or more columns (e.g.,bonus / salary), use.apply()with a function that receives the entire group DataFrame. - Lambda functions getting the column name
<lambda>: When using lambdas in a list of functions, the result column is named<lambda>, and multiple lambdas conflict. Use named functions orNamedAgginstead. - Expecting
transform()to work with functions that change shape:.transform()must return a result with the same length as the input group. Functions like"sum"work (broadcast scalar), but functions that return multiple rows per group cause errors. Use.agg()or.apply()for shape-changing operations.
Summary
- Use
.agg({"column": "function"})for simple per-column aggregations - Use
NamedAggtuples for clean, flat output column names (recommended) - Pass a list of functions per column to compute multiple aggregates, then flatten MultiIndex columns
- Use
.apply()when aggregation logic depends on multiple columns together - Use
.transform()to broadcast group-level results back to the original row-level shape - Use
as_index=Falseor.reset_index()to keep the group column as a regular column

