pandas
dataframe
python
data-manipulation
list-extraction

Get list from pandas dataframe column or row?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Converting pandas DataFrame values to Python lists is common when passing data into APIs, plotting libraries, or legacy code. The correct method depends on whether you need a single column, one row, or a two dimensional list. Getting this right means preserving order, data type intent, missing value behavior, and avoiding accidental conversion of more data than the consumer actually needs.

Extract a Single Column

For one column, Series.tolist is the most direct option.

python
1import pandas as pd
2
3
4df = pd.DataFrame(
5    {
6        "name": ["Ana", "Ben", "Cara"],
7        "score": [88, 92, 79],
8    }
9)
10
11names = df["name"].tolist()
12scores = df["score"].tolist()
13print(names)
14print(scores)

This keeps row order and is typically enough for API payload assembly.

Extract a Row as a List

Use positional indexing with iloc when you need row values.

python
first_row = df.iloc[0].tolist()
print(first_row)

If you need labeled values instead of ordered list, use to_dict from the selected row series.

python
first_row_dict = df.iloc[0].to_dict()
print(first_row_dict)

Choose list or dictionary based on downstream consumer expectations.

Extract Multiple Rows or Columns

For matrix style output, values.tolist or to_numpy().tolist both work.

python
subset = df.loc[[0, 2], ["name", "score"]]
rows = subset.values.tolist()
print(rows)

If type consistency matters, use to_numpy with explicit dtype when possible.

python
score_array = df[["score"]].to_numpy(dtype="int64")
print(score_array.tolist())

This avoids mixed type surprises that can happen with loosely typed object arrays.

If you are extracting from a row first and then converting, remember that pandas may upcast mixed row values to a broader common dtype. That is another reason to select only the columns you truly need before list conversion.

Handle Missing Values Before Conversion

Missing values often become nan in lists, which may break JSON encoding or strict typed consumers.

python
df2 = pd.DataFrame({"value": [1, None, 3]})
clean = df2["value"].fillna(0).astype(int).tolist()
print(clean)

Define missing value policy early:

  • Fill with defaults.
  • Drop missing rows.
  • Preserve as None for nullable APIs.

Different pipelines need different policies, so do not hardcode silently.

Performance Considerations

For very large dataframes, converting everything to native Python lists can be slow and memory heavy. Prefer vectorized operations and keep data in pandas or NumPy until list conversion is truly necessary.

Good practice:

  1. Filter rows first.
  2. Select only required columns.
  3. Convert to list at the boundary where consumer requires it.

This minimizes Python object creation overhead.

Preserve Ordering and Index Intent

By default, list conversion preserves current dataframe order. If order matters to business logic, sort explicitly before conversion.

python
ordered = df.sort_values("score", ascending=False)["name"].tolist()
print(ordered)

Do not rely on incidental order from previous operations unless guaranteed.

If the index itself matters, convert it separately rather than assuming it is part of the row list:

python
row_ids = df.index.tolist()
print(row_ids)

Practical Patterns for APIs

If an API expects a list of records, use to_dict(orient="records") instead of nested list conversion.

python
records = df[["name", "score"]].to_dict(orient="records")
print(records)

This is often clearer and less error prone when keys matter in payload schemas.

Prepare Lists for Model or Numerical Inputs

Machine learning and numerical routines often require strict shape and dtype guarantees. When converting from DataFrame, select columns in explicit order and convert to NumPy before list conversion.

python
features = df[["score"]].to_numpy(dtype="float64")
print(features.tolist())

For row oriented model APIs, convert selected rows in one step so schema alignment is obvious:

python
batch = df.loc[[0, 1], ["score"]].to_numpy(dtype="float64").tolist()
print(batch)

Common Pitfalls

  • Converting entire dataframes to lists when only one column is needed.
  • Ignoring missing value behavior and leaking nan into outputs.
  • Losing semantic meaning by using row lists where keyed dictionaries are required.
  • Assuming column order implicitly without selecting columns explicitly.
  • Converting too early and giving up pandas performance advantages.

Summary

  • Use Series.tolist for single column extraction.
  • Use iloc for row extraction and choose list or dict intentionally.
  • Use subset selection before matrix conversions for control and efficiency.
  • Handle missing values before conversion to match consumer expectations.
  • Convert to Python lists at the latest possible stage in the pipeline.

Course illustration
Course illustration

All Rights Reserved.