Cassandra
commitLog
SSTables
database
data storage

commitLog and SSTables in Cassandra database

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure. At the core of Cassandra's storage architecture are two critical concepts: the commit log and the SSTable (Sorted String Table). Together, these components help ensure data durability and efficient read and write operations.

CommitLog: Ensuring Durability

In Cassandra, the commit log is a write-ahead log used to capture all write operations. It plays a crucial role in ensuring durability, allowing for data recovery in the event of a server crash or unexpected shutdown.

Technical Explanation

  • Write Path: When a write operation is issued in Cassandra, it is initially written to the commit log. This is a sequential I/O operation, which is generally faster compared to random writes.
  • Concurrency: Multiple threads can safely write into the commit log simultaneously, which is handled by using lock-free algorithms and appending data to the end of the file.
  • Flush Policy: Data in the commit log is periodically flushed to disk. The in-memory data structure (Memtable) is also updated concurrently. Once the Memtable is filled to a certain threshold, it is flushed to an SSTable.
  • Recovery: In the event of a failure, Cassandra can replay the commit log to recover any in-memory writes that weren't flushed to their corresponding SSTables.

Example

Consider a scenario where a server experiences an unexpected shutdown:

  1. Data Write: Write data to the commit log.
  2. Memtable Update: Data is simultaneously written to the Memtable.
  3. Failure: The server crashes.
  4. Recovery Initiation: Upon restart, Cassandra replays the commit log.
  5. Recovered Data: Data writes are restored to the Memtable to ensure consistency.

SSTables: Efficient Read Operations

SSTables are immutable data files that are stored on disk once a Memtable is flushed. They form the backbone of data storage in Cassandra and are optimized for read-heavy workloads.

Technical Explanation

  • Write Once: Once data is written to an SSTable, it becomes immutable, ensuring a consistent view of data over time.
  • Compaction: Over time, multiple SSTables are merged and compacted to reduce space and improve read efficiency.
  • Bloom Filters: SSTables use Bloom filters to minimize the read latency by providing a probabilistic existence check, effectively reducing disk seeks.
  • Indexing: Each SSTable has an associated index to allow for efficient key lookups.

Example

Given the scenario where data must be retrieved from storage:

  1. Data Flush: A Memtable is flushed, creating a new SSTable.
  2. Merge and Compaction: Periodic compaction merges multiple SSTables.
  3. Read Optimization: When a query is executed, Bloom filters and indexes help quickly locate the relevant SSTable.
  4. Data Retrieval: Data is retrieved with minimized I/O operations.

Comparison and Summary

To encapsulate the role and significance of commit logs and SSTables, consider the following table:

FeatureCommit LogSSTables
DurabilityEnsures data durability through write-ahead loggingImmutable storage for persistent data
Write ModelAppend-only, sequentialCreated upon Memtable flush, write-once structure
Read ModelNot directly used for readsBloom filters and indexes optimize read operations
ConcurrencySupports concurrent writesImmutable nature supports straightforward read concurrency
RecoveryUsed to recover writes in the event of failureNot involved in recovery processes
CompactionN/APeriodically merged and compacted to optimize storage

Advanced Topics

Tombstones and Garbage Collection

  • Tombstones: In a distributed context, deletes are represented by tombstones, which are markers that indicate a data deletion.
  • Garbage Collection: SSTables, during compaction, identify and remove the tombstones based on a configurable grace period.

Write Paths and Hints

  • Hinted Handoffs: Cassandra can temporarily store "hints" when a node is down, ensuring eventual consistency once the node recovers.
  • Write Process Flow: Write operations involve interactions between the commit log, Memtables, in-memory caches, and eventually the SSTables.

Conclusion

The efficiency and reliability of Cassandra primarily stem from its distinctive approach to data storage and retrieval using commit logs and SSTables. Together, they ensure high durability, availability, and low-latency access to data. Understanding these components is vital for leveraging Cassandra to its full potential, especially in distributed systems with high write and read demands.


Course illustration
Course illustration

All Rights Reserved.