How to take column-slices of dataframe in pandas
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Column slicing in pandas means selecting one or more columns from a DataFrame either by label, by position, or by a name pattern. The right method depends on whether you know the exact column names, want a contiguous range, or need integer-based slicing similar to NumPy.
The most important distinction is between label-based selection with .loc and position-based selection with .iloc.
Select Specific Columns By Name
If you already know the columns you want, pass a list of names:
This returns a new DataFrame containing just those columns.
One subtle detail matters here: df["name"] returns a Series, while df[["name"]] returns a one-column DataFrame. That shape difference often matters in later pipeline code.
Slice Columns By Label With .loc
Use .loc when you want label-based slicing:
The row selector is : for all rows, and the column selector is a label slice.
A key pandas detail is that .loc label slices are inclusive. If you slice from "name" to "city", both endpoints are included. That differs from normal Python slicing and is worth remembering.
Slice Columns By Position With .iloc
If you want columns by integer position instead of by name, use .iloc:
This selects the first two columns by position. Unlike .loc, .iloc follows normal Python slicing rules, so the stop index is excluded.
Use .iloc when column order matters more than column names, or when the code is working with numeric positions produced elsewhere.
Select Columns By Pattern
Sometimes you want columns that follow a naming convention rather than a fixed list. In that case, filter is often the cleanest tool:
This is useful when schemas contain repeated prefixes, suffixes, or generated column groups.
Column Ranges Depend On Column Order
A detail that surprises many people is that label slices still depend on the current column order. If the columns are reordered, df.loc[:, "name":"city"] may select a different range than before.
That means label-range slicing is best when the schema order is stable and meaningful. If column order may drift, selecting explicit column names is usually safer than slicing a label range.
Pick The Method By Intent
A good rule is:
- use
df[[...]]when you know the exact column names - use
.locwhen a label range matters - use
.ilocwhen column positions matter - use
filterwhen you are matching by name pattern
The method should match the real reason you are selecting the columns. That keeps the code easier to read later. It also makes schema assumptions more obvious to the next person who has to maintain the analysis pipeline.
Common Pitfalls
The biggest mistake is forgetting the difference between a Series and a one-column DataFrame. df["name"] and df[["name"]] are not interchangeable in many pipelines.
Another pitfall is mixing up .loc and .iloc. .loc is label-based and inclusive on both ends, while .iloc is position-based and excludes the stop index.
A third issue is relying on column positions when upstream schemas may change. If the order is unstable, name-based selection is usually safer than positional slicing.
Summary
- Use
df[[...]]to select a known set of named columns. - Use
.locfor label-based column slices. - Use
.ilocfor position-based column slices. - Use
filterwhen column names follow a useful pattern. - Remember that
.locis inclusive while.ilocfollows normal Python stop-exclusion.

