DynamoDB
Scan
Query
Database Operations
AWS

What is the difference between scan and query in dynamodb? When use scan / query?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Scan vs Query in DynamoDB: Understanding the Differences and When to Use Each

Amazon DynamoDB is a fully-managed NoSQL database service designed for fast, predictable performance and seamless scalability. When working with DynamoDB, developers need to retrieve data efficiently. Two core operations available for data retrieval in DynamoDB are Scan and Query. This article provides a detailed exploration of the differences between these operations and guidance on when to use each.

Scan Operation

The Scan operation examines every item in the specified DynamoDB table and by default returns all data attributes by matching the provided filters. Here are some technical aspects and examples to illustrate Scan's functionality:

  • Full Table Read: Unlike a Query, Scan reads every item in the table, which can make it slower and more resource-intensive. Thus, it's best suited for applications where performance is secondary to obtaining every item that matches the criteria.
  • Filters: You can provide filter expressions to limit the data retrieved by a Scan. This does not reduce the number of items that are read initially, but it reduces the number of items returned by the operation.
  • Use Cases: Best used for analytics, reporting, or exporting entire datasets where a complete read-through is necessary. For example, obtaining a full export of data for an external backup or detailed analysis.

Example:

python
response = table.scan(
    FilterExpression=Attr('attribute_name').contains('value')
)

Query Operation

The Query operation uses the primary key to select and retrieve data from a DynamoDB table efficiently. Here are key technical points and usage scenarios for Query:

  • Key-Based Retrieval: Query is optimized for fetching items by using the partition key (and optionally, the sort key). It goes directly to the location in the database based on the hash, making it faster and more efficient than Scan.
  • Indexed Searches: Allows the use of secondary indexes to improve query performance and scope. While the partition key is obligatory, the sort key can be optional, facilitating greater flexibility.
  • Filters: Like Scan, Query supports filter expressions, but the number of items read is minimized due to key-based access. The filtering is used after fetching the items using primary or secondary indexes.
  • Use Cases: Ideal for real-time applications where quick, efficient data retrieval is crucial. For example, fetching all orders made by a specific customer using their unique customer ID as the partition key.

Example:

python
response = table.query(
    KeyConditionExpression=Key('partition_name').eq('value') & Key('sort_key').between(low, high)
)

Key Differences

Let's summarize the primary differences between Scan and Query operations in the table below:

FeatureScanQuery
Key RequirementNo key required⁣Requires a partition⁣ key (and optionally a sort key)
PerformanceSlower, reads entire table ⁣Faster, only reads necessary partitions⁣
Use of IndexNot applicable⁣Can utilize secondary indexes for performance⁣
Item LimitPotentially large results⁣Returns a scoped result set designed to reduce data volume
Filter UsageFilters applied after read⁣Filters applied after key-based selection⁣
Primary Use CasesAnalytics, backups⁣Real-time application queries⁣

Additional Considerations

  • Provisioned Throughput: Both Scan and Query operations consume read capacity units, but Scan operations can dramatically consume more resources if tables are large. It's important to adjust your capacity settings to accommodate heavier loads when using Scan.
  • Pagination: Both operations return paginated results. By default, a single call will return up to 1 MB of data, or less, if no more data is encountered. Always handle pagination by iterating over the pages when the dataset size is uncertain.
  • Efficient Use of Query: By structuring your table schema to leverage queries instead of scans, you can minimize costs and maximize performance. Always ensure your design allows for the effective use of partition and sort keys to facilitate quick read patterns.

By understanding the distinct characteristics and appropriate use cases for both Scan and Query operations, you can design your DynamoDB interactions to optimize performance and cost-effectiveness.


Course illustration
Course illustration