How to Create Dataframe from AWS Athena using Boto3 get_query_results method
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Athena lets you query files in S3 with SQL, and boto3 gives you the low-level API to run those queries from Python. If you want a pandas DataFrame, the missing step is converting Athena's tabular response format into rows and column names.
The important detail is that get_query_results returns metadata and rows separately, and the first row often contains column headers. You also need to wait for query completion and page through results when the output is larger than a single response.
Run the Athena Query and Wait for Completion
The normal flow is:
- call
start_query_execution - poll
get_query_execution - once the query succeeds, call
get_query_results - transform the result into a
DataFrame
Here is a working example:
You must wait for SUCCEEDED. Calling get_query_results too early will either fail or return incomplete state.
Convert get_query_results Output to Rows
Athena returns rows as a list of Data cells. The simplest approach is to read the column names from ResultSetMetadata, then extract each row into a Python list and build a DataFrame.
That produces a normal pandas table with string values. Athena returns text for most fields through this API, so you may want to convert types afterward.
Convert Data Types Explicitly
If a column should be numeric or datetime, cast it in pandas after loading:
You can also inspect ColumnInfo if you want to map Athena data types to pandas types more systematically. That is helpful for production ingestion jobs where type correctness matters.
For example:
This is useful when your query returns decimals, timestamps, or nullable fields that need careful handling.
When to Use This Approach
Using get_query_results is appropriate when you want direct API control, pagination, explicit polling, or custom error handling. If all you need is a DataFrame, higher-level tools such as awswrangler are more convenient, but boto3 keeps dependencies minimal and exposes the raw Athena model.
It is also helpful when you need to integrate query execution into an application that already uses boto3 for IAM, S3, or Glue operations.
Common Pitfalls
- Forgetting to wait for query completion before calling
get_query_results. - Not configuring an S3 output location that Athena can write to.
- Treating the first returned row as data when it is actually the header row.
- Assuming numeric columns come back as numeric pandas types automatically. Most values arrive as strings.
- Ignoring pagination and accidentally loading only the first page of a large result set.
Summary
- Start the Athena query, poll until it reaches
SUCCEEDED, then fetch the results. - Read column names from
ResultSetMetadataand build row lists fromRows. - Skip the header row when Athena includes it in the result data.
- Convert important columns to numeric or datetime types explicitly in pandas.
- Use
get_query_resultswhen you want fine-grained control over the Athena workflow in Python.

