pandas
groupby
dataframe
python
data-analysis

How to access subdataframes of pandas groupby by key

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

After calling df.groupby(), you can access individual group DataFrames using get_group(key), iterate with a for loop, or convert all groups to a dictionary with dict(list(grouped)). The GroupBy object does not directly support bracket indexing like grouped["key"] for retrieving subdataframes — you must use one of these explicit methods.

Setup

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "department": ["Sales", "Sales", "Engineering", "Engineering", "HR"],
5    "employee": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
6    "salary": [70000, 65000, 90000, 95000, 60000]
7})
8
9grouped = df.groupby("department")

get_group() for a Single Group

get_group(key) returns the subdataframe for a specific key:

python
1eng_df = grouped.get_group("Engineering")
2print(eng_df)
3#     department employee  salary
4# 2  Engineering  Charlie   90000
5# 3  Engineering    Diana   95000
6
7sales_df = grouped.get_group("Sales")
8print(sales_df)
9#   department employee  salary
10# 0      Sales    Alice   70000
11# 1      Sales      Bob   65000

The returned DataFrame keeps the original index. If the key does not exist, get_group() raises KeyError.

Iterating Over All Groups

A for loop over the GroupBy object yields (key, subdataframe) pairs:

python
1for name, group_df in grouped:
2    print(f"--- {name} ---")
3    print(group_df)
4    print()
5
6# --- Engineering ---
7#     department employee  salary
8# 2  Engineering  Charlie   90000
9# 3  Engineering    Diana   95000
10#
11# --- HR ---
12#   department employee  salary
13# 4         HR      Eve   60000
14#
15# --- Sales ---
16#   department employee  salary
17# 0      Sales    Alice   70000
18# 1      Sales      Bob   65000

Groups are yielded in sorted key order by default.

Converting to a Dictionary

dict(list(grouped)) creates a dictionary mapping keys to DataFrames:

python
1group_dict = dict(list(grouped))
2
3# Access by key
4print(group_dict["HR"])
5#   department employee  salary
6# 4         HR      Eve   60000
7
8# List all keys
9print(list(group_dict.keys()))
10# ['Engineering', 'HR', 'Sales']

This loads all groups into memory at once, which is fine for moderate-sized datasets.

Using .groups and .indices

The .groups attribute returns a dictionary of group keys to index labels:

python
1print(grouped.groups)
2# {'Engineering': [2, 3], 'HR': [4], 'Sales': [0, 1]}
3
4# Use the indices to select rows
5eng_indices = grouped.groups["Engineering"]
6eng_df = df.loc[eng_indices]

The .indices attribute is similar but returns NumPy arrays:

python
print(grouped.indices)
# {'Engineering': array([2, 3]), 'HR': array([4]), 'Sales': array([0, 1])}

Multi-Key GroupBy

When grouping by multiple columns, keys become tuples:

python
1df2 = pd.DataFrame({
2    "department": ["Sales", "Sales", "Sales", "Engineering"],
3    "level": ["Senior", "Junior", "Senior", "Senior"],
4    "salary": [80000, 60000, 85000, 95000]
5})
6
7grouped2 = df2.groupby(["department", "level"])
8
9# get_group with a tuple key
10senior_sales = grouped2.get_group(("Sales", "Senior"))
11print(senior_sales)
12#   department   level  salary
13# 0      Sales  Senior   80000
14# 2      Sales  Senior   85000
15
16# Iteration yields tuple keys
17for (dept, level), group_df in grouped2:
18    print(f"{dept} - {level}: {len(group_df)} employees")
19# Engineering - Senior: 1 employees
20# Sales - Junior: 1 employees
21# Sales - Senior: 2 employees

Applying Functions to Specific Groups

You can combine get_group() with operations for targeted analysis:

python
1# Get statistics for one group
2eng_stats = grouped.get_group("Engineering")["salary"].describe()
3print(eng_stats)
4# count        2.0
5# mean     92500.0
6# std       3535.5
7# ...
8
9# Apply a function only to specific groups
10for name, group_df in grouped:
11    if name in ["Sales", "Engineering"]:
12        avg = group_df["salary"].mean()
13        print(f"{name} average salary: ${avg:,.0f}")
14# Engineering average salary: $92,500
15# Sales average salary: $67,500

filter() for Conditional Group Selection

filter() returns rows from groups that satisfy a condition:

python
1# Keep only departments with average salary > 70000
2high_salary = grouped.filter(lambda x: x["salary"].mean() > 70000)
3print(high_salary)
4#     department employee  salary
5# 2  Engineering  Charlie   90000
6# 3  Engineering    Diana   95000

This returns a flat DataFrame, not a GroupBy object.

Checking if a Group Exists

python
1# Check before calling get_group to avoid KeyError
2if "Marketing" in grouped.groups:
3    marketing_df = grouped.get_group("Marketing")
4else:
5    print("No Marketing group found")
6
7# Or list all available keys
8print(grouped.groups.keys())
9# dict_keys(['Engineering', 'HR', 'Sales'])
10
11# Number of groups
12print(grouped.ngroups)
13# 3

Common Pitfalls

  • Using bracket indexing for subdataframes: grouped["salary"] returns a SeriesGroupBy object (for column selection), not a subdataframe for a key named "salary". Use get_group("salary") to access a group by key.
  • Forgetting tuple keys for multi-column groupby: grouped.get_group("Sales") raises KeyError when grouped by two columns. Use get_group(("Sales", "Senior")) with a tuple matching all grouping columns.
  • Mutating the returned subdataframe: get_group() returns a view or copy depending on the pandas version. To safely modify a group, use .copy(): group_df = grouped.get_group("Sales").copy().
  • Memory with dict(list(grouped)): Converting all groups to a dictionary duplicates the data in memory. For large DataFrames with many groups, iterate instead of converting to a dict.
  • Expecting original order: Groups are yielded in sorted key order, not in the order they first appear in the DataFrame. Use sort=False in groupby() to preserve first-appearance order: df.groupby("department", sort=False).

Summary

  • Use grouped.get_group(key) to retrieve a single subdataframe by key
  • Iterate with for key, df in grouped: to process all groups
  • Use dict(list(grouped)) to build a key-to-DataFrame dictionary
  • Access index arrays with grouped.groups or grouped.indices
  • For multi-column groupby, keys are tuples
  • Use grouped.filter() to select groups meeting a condition

Course illustration
Course illustration

All Rights Reserved.