Database
GSI
Table
Data Structure
NoSQL

Differences between GSI and table

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Databases have become an integral part of modern applications, providing organized ways to store, manage, and retrieve data. Within the context of NoSQL databases like Amazon DynamoDB, one often encounters concepts like "tables" and "Global Secondary Indexes (GSI)." Understanding their differences, functionalities, and use-cases can significantly optimize data management and retrieval in modern applications.

Key Concepts: Tables vs. GSI

Before delving into the differences, let's clarify the primary functions and definitions of a Table and a GSI.

Tables

A table in a database is a collection of related data entries. In relational databases, a table is often defined by columns representing attributes of data and rows representing individual records. In NoSQL databases like DynamoDB, tables are designed to be flexible, allowing for schema-less storage, where each item can have different attributes, although typically sharing a partition key.

Global Secondary Index (GSI)

A Global Secondary Index in DynamoDB is a powerful feature that allows the creation of alternate query patterns. GSI provides a way to retrieve data on non-key attributes with quick access performance. It essentially enables the overlay of additional, differently-keyed views on the same dataset, which can be crucial for query flexibility and optimizing application performance.

Technical Differences

Below are the technical distinctions between tables and GSIs when using a system like DynamoDB:

AspectTableGlobal Secondary Index (GSI)
DefinitionMain storage structure holding raw dataSecondary structure for querying data on non-primary key attributes
Primary KeyContains both partition key and sort keyDefined separately, and can differ from the table’s primary key
Schema FlexibilitySchema-less, items can differ in attributesInherits flexibility but attributes used in GSI must match table attributes
UsageFor CRUD operations on main data entriesFor querying and reading data with alternate keys
PerformanceDependent on partition key distributionPerformance affects both table and index provisioned throughput
Capacity & BillingCharged based on read/write capacity unitsAdditional charges based on additional read/write operations
Data ConsistencyOn writes, consistent across itemsEventual consistency in indexes from replication delay
Operational ComplexitySimple, manages primary dataAdds complexity needing careful planning for query patterns

Exploring Use Cases

When to Use a Table

Tables are used to manage complete datasets and are best when:

  • The access pattern primarily revolves around the data's primary key.
  • The application requires a primary storage structure for executing CRUD operations.
  • Flexibility is needed in defining item attributes without pre-configuring column constraints.

When to Use GSIs

GSIs are beneficial when:

  • There is a need to query datasets on non-primary attributes.
  • The application design anticipates multiple retrieval requirements not supported by the primary index.
  • There's a vision to enhance data access patterns for better performance without extensive data normalization.

Example Scenario

Assume we have a "Users" table:

plaintext
Partition Key: UserId
Sort Key: None
Attributes: Name, Email, Age, JoinDate

We might want to query users by their "Email" attribute instead of the "UserId." A GSI can be created:

plaintext
GSI: EmailIndex
Partition Key: Email
Sort Key: JoinDate (optional)

With EmailIndexEmailIndex, you can now efficiently query users by email.

Maintaining and Managing GSIs

While providing enhanced querying capabilities, managing GSIs involves thoughtful planning:

  • Provisioned Throughput: Allocating throughput to GSIs is crucial as it directly impacts read and write capacity.
  • Data Consistency: Understanding eventual consistency is important, as changes in tables may take time to reflect in GSIs.
  • Cost Implications: GSIs involve additional costs tied to their read/write capacity, demanding strategic budgeting.

Conclusion

Understanding when to use tables versus GSIs can significantly influence application design and performance. While tables serve as the foundational framework for storing data, GSIs enhance the flexibility and efficiency of data retrieval operations. Thoughtful use of GSIs, considering their cost and complexity, can enrich application functionalities and user experience. Embracing these differences empowers database architects and developers to optimize data architecture effectively.


Course illustration
Course illustration

All Rights Reserved.