Introduction
A MultiIndex (hierarchical index) DataFrame in pandas has two or more index levels, allowing you to represent higher-dimensional data in a 2D table. Selecting rows from a MultiIndex DataFrame uses different syntax than a single-index DataFrame — you can select by level, cross-section, or boolean conditions using .loc, .xs(), IndexSlice, and query().
Creating a MultiIndex DataFrame
1import pandas as pd
2import numpy as np
3
4# Method 1: From tuples
5index = pd.MultiIndex.from_tuples([
6 ('US', 'NYC'), ('US', 'LA'), ('US', 'Chicago'),
7 ('UK', 'London'), ('UK', 'Manchester'),
8 ('DE', 'Berlin'), ('DE', 'Munich')
9], names=['country', 'city'])
10
11df = pd.DataFrame({
12 'population': [8_336_817, 3_979_576, 2_693_976, 8_982_000, 553_230, 3_748_148, 1_472_000],
13 'area_km2': [783, 1_302, 606, 1_572, 115, 892, 310]
14}, index=index)
15
16print(df)
17# population area_km2
18# country city
19# US NYC 8336817 783
20# LA 3979576 1302
21# Chicago 2693976 606
22# UK London 8982000 1572
23# Manchester 553230 115
24# DE Berlin 3748148 892
25# Munich 1472000 310
Select by First Level (Outer Index)
1# All rows for 'US'
2df.loc['US']
3# population area_km2
4# city
5# NYC 8336817 783
6# LA 3979576 1302
7# Chicago 2693976 606
8
9# Multiple countries
10df.loc[['US', 'UK']]
Select by Both Levels
1# Specific row: country='US', city='NYC'
2df.loc[('US', 'NYC')]
3# population 8336817
4# area_km2 783
5
6# Multiple specific rows
7df.loc[[('US', 'NYC'), ('UK', 'London')]]
Select by Second Level with xs()
xs() (cross-section) selects rows by a value at a specific level:
1# All rows where city='London' (regardless of country)
2df.xs('London', level='city')
3# population area_km2
4# country
5# UK 8982000 1572
6
7# All rows where country='DE'
8df.xs('DE', level='country')
9# population area_km2
10# city
11# Berlin 3748148 892
12# Munich 1472000 310
Slicing with IndexSlice
pd.IndexSlice enables intuitive slicing on MultiIndex DataFrames:
1idx = pd.IndexSlice
2
3# All cities in 'US'
4df.loc[idx['US', :], :]
5# Same as df.loc['US']
6
7# All countries, but only cities starting with 'M' (requires sorted index)
8df = df.sort_index() # Must sort for slicing
9df.loc[idx[:, 'London':'Munich'], :]
10
11# Specific level combinations
12df.loc[idx[['US', 'UK'], :], :]
Boolean Filtering
1# Filter by column values
2df[df['population'] > 3_000_000]
3
4# Filter by index level using get_level_values
5df[df.index.get_level_values('country') == 'US']
6
7# Combine conditions
8mask = (df.index.get_level_values('country').isin(['US', 'UK'])) & \
9 (df['population'] > 1_000_000)
10df[mask]
Using query() with MultiIndex
1# query() can reference index levels by name
2df.query('country == "US"')
3
4df.query('country == "US" and population > 3000000')
5
6# Use backticks for level names with special characters
7df.query('`country` in ["US", "UK"] and area_km2 > 500')
Resetting and Setting Index
1# Move index levels to columns
2df_flat = df.reset_index()
3# country city population area_km2
4# 0 US NYC 8336817 783
5# 1 US LA 3979576 1302
6
7# Filter on flat DataFrame, then set index back
8result = df_flat[df_flat['country'] == 'US'].set_index(['country', 'city'])
Selecting Specific Columns with MultiIndex
1# Select one column for one country
2df.loc['US', 'population']
3# city
4# NYC 8336817
5# LA 3979576
6# Chicago 2693976
7
8# Select specific rows and columns
9df.loc[('US', 'NYC'), 'population']
10# 8336817
GroupBy Operations on MultiIndex
1# Aggregate by first level
2df.groupby(level='country').sum()
3# population area_km2
4# country
5# DE 5220148 1202
6# UK 9535230 1687
7# US 15010369 2691
8
9# Aggregate by second level
10df.groupby(level='city').mean()
Swapping and Reordering Levels
1# Swap levels
2df_swapped = df.swaplevel()
3# population area_km2
4# city country
5# NYC US 8336817 783
6# LA US 3979576 1302
7
8# Now select by city first
9df_swapped.loc['London']
10
11# Reorder levels
12df_reordered = df.reorder_levels(['city', 'country'])
Common Pitfalls
Unsorted index for slicing: Slicing with loc on a MultiIndex requires the index to be sorted. Call df.sort_index() before slicing, or you get a UnsortedIndexError.
Tuple vs list in loc: df.loc[('US', 'NYC')] selects one row (tuple = single key). df.loc[['US', 'UK']] selects multiple first-level keys (list = multiple keys). Mixing these up gives unexpected results.
Dropping levels: After selecting by one level (e.g., df.loc['US']), that level is dropped from the result. Use drop_level=False in xs() to keep it: df.xs('US', level='country', drop_level=False).
Performance with large DataFrames: Repeated get_level_values() calls in loops are slow. For bulk filtering, reset the index, filter on columns, and set the index back.
String vs numeric index: If index levels are strings, df.loc[1] will fail. Ensure you use the correct type for indexing. Check with df.index.get_level_values(0).dtype.
Summary
Use df.loc['key'] to select by the outer (first) level of a MultiIndex
Use df.loc[('level1', 'level2')] to select by both levels
Use df.xs('value', level='name') to select by any level, including inner levels
Use pd.IndexSlice for intuitive slicing across multiple levels
Use df.query('level_name == "value"') for readable filtering on index levels
Always sort the index with df.sort_index() before slicing