AWS Athena Querying by an attributes of a struct with an array
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Athena can query nested data directly, but arrays of structs require a different style than flat SQL tables. If the value you want to filter on lives inside a struct that is itself inside an array, the two main tools are UNNEST and Athena's array functions.
Understanding the Shape of the Data
Suppose each row stores an order with an items array. Each item is a struct, and each item contains nested metadata.
A matching Athena table might look like this:
Once the schema is correct, querying nested attributes becomes much easier.
Use UNNEST for Element-Level Filtering
If you want to query each item inside the array as if it were a row, flatten the array with UNNEST.
This is the most direct approach when you need one result row per matching nested item. It is conceptually similar to exploding the array first and then filtering the exploded rows.
Use EXISTS When You Only Need the Parent Row
Sometimes you do not want every matching item. You only want the parent order if any nested item satisfies a condition. In that case, EXISTS is cleaner than returning exploded rows and deduplicating later.
This pattern is useful when the query is logically about orders, not about individual items.
Array Functions for Nested Output
Athena also supports lambda-based array functions. These are helpful when you want to keep the nested shape instead of flattening everything.
For example, keep only the red items:
Or project only the SKUs from the array:
These functions are often easier to read when the output should remain nested.
Null Handling Matters
Nested semi-structured data often contains missing fields. If meta or meta.color is null for some elements, a simple filter can silently skip those rows.
That extra null check is often worth adding when the raw data is not perfectly clean.
Performance Considerations
Nested queries in Athena can become expensive if the source data is large and still stored as raw JSON. Once the schema is stable, converting the data to Parquet or ORC usually improves both cost and performance.
It also helps to select only the nested fields you actually need. Avoid broad SELECT * queries on wide nested documents if the filter depends on only one attribute.
When to Choose Which Pattern
Use UNNEST when:
- you need one row per matching array element,
- you want to aggregate by inner-item attributes,
- the analysis is item-centric.
Use EXISTS when:
- you only need to know whether any nested item matches,
- you want to return one row per parent record,
- you want to avoid duplicate parent rows.
Use filter or transform when:
- you want to preserve nested output,
- you want compact transformations without full row explosion.
Common Pitfalls
A common mistake is trying to reference an array of structs like a scalar value. Arrays need either UNNEST or an array function before you can query inside them naturally.
Another issue is using UNNEST when the real question is about the parent row. That can multiply rows and force unnecessary deduplication.
Teams also forget about null nested fields. In semi-structured datasets, missing attributes are normal, not exceptional.
Summary
- Arrays of structs in Athena are usually queried with
UNNESTor array functions. - Use
UNNESTwhen you want one row per nested element. - Use
EXISTSwhen you only need the parent row if any nested element matches. - Use
filterandtransformwhen you want to keep nested output. - Stable schemas and columnar storage make nested Athena queries much easier to maintain and cheaper to run.

