How to access subdataframes of pandas groupby by key
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
After calling df.groupby(), you can access individual group DataFrames using get_group(key), iterate with a for loop, or convert all groups to a dictionary with dict(list(grouped)). The GroupBy object does not directly support bracket indexing like grouped["key"] for retrieving subdataframes — you must use one of these explicit methods.
Setup
get_group() for a Single Group
get_group(key) returns the subdataframe for a specific key:
The returned DataFrame keeps the original index. If the key does not exist, get_group() raises KeyError.
Iterating Over All Groups
A for loop over the GroupBy object yields (key, subdataframe) pairs:
Groups are yielded in sorted key order by default.
Converting to a Dictionary
dict(list(grouped)) creates a dictionary mapping keys to DataFrames:
This loads all groups into memory at once, which is fine for moderate-sized datasets.
Using .groups and .indices
The .groups attribute returns a dictionary of group keys to index labels:
The .indices attribute is similar but returns NumPy arrays:
Multi-Key GroupBy
When grouping by multiple columns, keys become tuples:
Applying Functions to Specific Groups
You can combine get_group() with operations for targeted analysis:
filter() for Conditional Group Selection
filter() returns rows from groups that satisfy a condition:
This returns a flat DataFrame, not a GroupBy object.
Checking if a Group Exists
Common Pitfalls
- Using bracket indexing for subdataframes:
grouped["salary"]returns aSeriesGroupByobject (for column selection), not a subdataframe for a key named "salary". Useget_group("salary")to access a group by key. - Forgetting tuple keys for multi-column groupby:
grouped.get_group("Sales")raisesKeyErrorwhen grouped by two columns. Useget_group(("Sales", "Senior"))with a tuple matching all grouping columns. - Mutating the returned subdataframe:
get_group()returns a view or copy depending on the pandas version. To safely modify a group, use.copy():group_df = grouped.get_group("Sales").copy(). - Memory with dict(list(grouped)): Converting all groups to a dictionary duplicates the data in memory. For large DataFrames with many groups, iterate instead of converting to a dict.
- Expecting original order: Groups are yielded in sorted key order, not in the order they first appear in the DataFrame. Use
sort=Falseingroupby()to preserve first-appearance order:df.groupby("department", sort=False).
Summary
- Use
grouped.get_group(key)to retrieve a single subdataframe by key - Iterate with
for key, df in grouped:to process all groups - Use
dict(list(grouped))to build a key-to-DataFrame dictionary - Access index arrays with
grouped.groupsorgrouped.indices - For multi-column groupby, keys are tuples
- Use
grouped.filter()to select groups meeting a condition

