pandas
GroupBy
data aggregation
Python
data analysis

Multiple aggregations of the same column using pandas GroupBy.agg

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Pandas GroupBy.agg can apply several aggregations to the same column in one pass. The cleanest modern approach is named aggregation, because it gives each result column an explicit name and avoids awkward MultiIndex cleanup later.

The Basic Pattern

Suppose you have sales data by store:

python
1import pandas as pd
2
3df = pd.DataFrame(
4    {
5        "store": ["A", "A", "B", "B", "B"],
6        "sales": [100, 150, 80, 120, 160],
7    }
8)
9
10result = df.groupby("store").agg(
11    sales_sum=("sales", "sum"),
12    sales_mean=("sales", "mean"),
13    sales_max=("sales", "max"),
14)
15
16print(result)

This is the most readable answer to “run multiple aggregations on the same column.” Each output column is named directly, and each one points back to the same source column.

Why Named Aggregation Is Better

Older Pandas examples often use a dictionary that produces a MultiIndex column result:

python
result = df.groupby("store").agg({"sales": ["sum", "mean", "max"]})
print(result)

This still works, but the output columns become hierarchical, which is sometimes inconvenient:

text
          sales
            sum mean max
store

Named aggregation is easier to use downstream because the result columns are already flat and descriptive.

Add Custom Aggregations Too

You are not limited to built-in strings. A custom function works as well:

python
1def value_range(series):
2    return series.max() - series.min()
3
4
5result = df.groupby("store").agg(
6    sales_sum=("sales", "sum"),
7    sales_mean=("sales", "mean"),
8    sales_range=("sales", value_range),
9)
10
11print(result)

This is useful when you want several standard summaries plus one domain-specific metric from the same grouped column.

Aggregate Multiple Columns at Once

The same agg call can combine repeated aggregation of one column with different work on other columns:

python
1df = pd.DataFrame(
2    {
3        "store": ["A", "A", "B", "B", "B"],
4        "sales": [100, 150, 80, 120, 160],
5        "transactions": [10, 12, 8, 9, 15],
6    }
7)
8
9result = df.groupby("store").agg(
10    sales_sum=("sales", "sum"),
11    sales_mean=("sales", "mean"),
12    transaction_total=("transactions", "sum"),
13)
14
15print(result)

This is often cleaner than running separate groupby operations and merging them back together.

Dealing with Missing Values

Most built-in aggregations already ignore missing values by default, but you should still know how the chosen functions behave. For example, mean ignores NaN, while a custom aggregation might not unless you write it carefully.

python
1df = pd.DataFrame(
2    {
3        "store": ["A", "A", "B"],
4        "sales": [100, None, 80],
5    }
6)
7
8result = df.groupby("store").agg(
9    sales_sum=("sales", "sum"),
10    sales_mean=("sales", "mean"),
11)
12
13print(result)

If missing data changes the meaning of your summary, address it explicitly before aggregation instead of assuming the defaults match your business rule.

Flatten Older MultiIndex Results If Needed

If you inherit code that uses the older MultiIndex style, flattening the columns is straightforward:

python
1result = df.groupby("store").agg({"sales": ["sum", "mean", "max"]})
2result.columns = ["_".join(column) for column in result.columns]
3result = result.reset_index()
4
5print(result)

That is still a valid cleanup pattern, but in new code named aggregation is usually simpler.

Common Pitfalls

The most common mistake is using the older dict-of-lists syntax and then being surprised by MultiIndex columns. That output is valid, but it is often more cumbersome than necessary.

Another pitfall is forgetting to name the output columns clearly. Aggregation code becomes hard to read quickly when result columns are called only sum, mean, and max without saying which source column they summarize.

It is also easy to assume custom functions handle missing values the same way built-ins do. If NaN behavior matters, test it explicitly.

Finally, avoid doing several separate groupby calls on the same grouping key when one agg call can express the full summary table more directly.

Summary

  • Use named aggregation to apply multiple functions to the same column cleanly.
  • Named aggregation produces flat, readable output columns.
  • You can mix built-in aggregations and custom functions in one agg call.
  • Watch how missing values affect the functions you choose.
  • Prefer one well-structured groupby(...).agg(...) over several separate aggregation passes.

Course illustration
Course illustration

All Rights Reserved.