Aerospike
Primary Index
Secondary Index
Database Management
Internal Workings

Aerospike How Primary & Secondary Index works internally

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Aerospike is a prominent NoSQL database known for its blazing speed and scalability, widely used in real-time big data applications. It leverages indexes to accelerate the querying process, among which primary and secondary indexes are crucial. Understanding how these indexes function internally provides insight into Aerospike's efficiency in data handling and retrieval.

Primary Index

A primary index in Aerospike is foundational for accessing records. It directly maps keys to record locations in the storage engine, whether data is stored in-memory or on SSDs. Each record in Aerospike has a unique primary key. This index is crucial for point queries where a specific record is retrieved directly using its primary key.

Technical Implementation:

When a record is written to an Aerospike database, it's assigned a unique digest via hashing the primary key. This digest serves as the record's unique identifier within the database. The primary index stores an entry for each record's digest. The index itself can be held entirely in-memory or partially in-memory depending on the namespace configuration:

  • In-memory: Ideal for fully in-memory databases or namespaces, resulting in ultra-fast data access.
  • On SSD: Used for persistent storage. The index entry will point to the location (block address) on the disk where the record's data is stored.

The data representation in the primary index is typically a tuple containing the digest and a pointer to the actual record data, facilitating rapid direct access.

Secondary Index

While the primary index works with unique keys, the secondary index in Aerospike allows querying based on non-key attributes. This feature is useful for complex queries, like finding all records that match specific criteria (e.g., all users from a particular city).

Technical Implementation:

Secondary indexes in Aerospike are implemented as separate data structures linked to the primary keys of records. When a secondary index is created, it involves scanning each record, extracting the indexed field, and then updating the secondary index structure with this field’s value and a reference back to the primary key or digest.

The type of the secondary index depends on the data type of the field being indexed:

  • String and Integer Indexes: They use a B-tree or a radix tree, efficiently accommodating range and point queries.
  • Geospatial Indexes: Implemented using a GeoHash, which facilitates proximity searches based on geographic coordinates.

Key Operations with Indexes:

  • Inserts: On an insert of a record, both the primary and any applicable secondary indexes are updated. For the primary, this involves creating a new entry, and for the secondary, this involves inserting the attribute value into the index.
  • Updates: For updates, the primary index usually remains unchanged unless the primary key itself changes, which is rare. However, secondary indexes might need extensive updates depending on what attributes are modified.
  • Deletes: Deleting a record necessitates removals from both primary and secondary indexes. This ensures that no dangling references remain in the secondary index that might point to non-existent primary keys.

Challenges and Considerations:

Managing and querying through indexes, especially secondary ones, can introduce overhead. Index updates can be resource-intensive, particularly for secondary indexes, where changes in indexed fields must propagate through potentially large index structures. Effective use of indexes in Aerospike, therefore, involves careful selection of indexed fields and awareness of the trade-offs between query speed and update cost.

Here’s a summarized comparison of Primary vs. Secondary Index:

FeaturePrimary IndexSecondary Index Developments
PurposeDirect record access by primary keyQuerying based on non-key attributesAdditional query capabilities
Data StructureHash table, direct pointer to dataB-tree, radix tree, or GeoHashEfficient complex queries
In-Memory/On-DiskCan be fully in-memory or on SSDPrimarily in-memoryConfigurable based on requirements
UpdatesOnly on record creation/deletionOn any modification to indexed attributesPotentially high overhead

Understanding the intricacies of Aerospike's indexing mechanisms is essential for optimizing database performance and scalability, especially in applications requiring fast access and high throughput for massive datasets.


Course illustration
Course illustration

All Rights Reserved.