Delayed Queue implementation in Storm – Kafka, Cassandra, Redis or Beanstalk?

Storm

Kafka

Cassandra

Redis

Beanstalk

Delayed Queue implementation in Storm – Kafka, Cassandra, Redis or Beanstalk?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

When building a system that requires the processing of delayed messages (messages that should be processed after a certain time delay), we often lean on specific technologies that support such functionalities natively or can be configured to do so. In this comparison, we explore how Storm pairs with Kafka, Cassandra, Redis, and Beanstalk to implement a delayed queue mechanism effectively. We'll consider technical details, use cases, and provide code snippets where applicable.

1. Storm with Kafka

Apache Kafka is a distributed streaming platform capable of handling high volumes of data. Kafka doesn't support delayed messages out of the box, but with Kafka version 2.1.0 and onwards, there is support for message headers which allow developers to implement a custom delayed queue.

Implementation Method:

One common approach is to use a secondary topic as a "delay queue." Messages that require a delay are first published to this secondary topic. Each message has a timestamp in its header indicating when it should be available. A separate consumer monitors this topic, checks the headers, and re-publishes the messages to the primary topic when they're ready.

Example:

java

1// Producer sending a delayed message
2ProducerRecord<String, String> record = new ProducerRecord<>(delayTopic, null, System.currentTimeMillis() + delayMs, null, message);
3producer.send(record);
4
5// Consumer re-publishing messages when delay is elapsed
6ConsumerRecord<String, String> record = consumer.poll(...);
7if (System.currentTimeMillis() >= record.timestamp()) {
8    producer.send(new ProducerRecord<>(mainTopic, record.key(), record.value()));
9}

2. Storm with Cassandra

Cassandra is a distributed NoSQL database known for its exceptional scalability and robustness. While Cassandra doesn’t provide a built-in delayed queue capability, its time-series data modeling can be effectively used to implement such a feature.

Implementation Method:

Use timestamps to mark each message with the time it should be processed. Store messages in Cassandra with a partition key that reflects their delay (e.g., the hour or minute of the target time). Periodically query the table for messages whose delay has passed and then process them.

Example:

sql

1CREATE TABLE delayed_messages (
2    target_time timestamp,
3    id uuid,
4    message text,
5    PRIMARY KEY (target_time, id)
6);
7
8// Writing a message to be processed 10 minutes later
9INSERT INTO delayed_messages (target_time, id, message) VALUES (toTimestamp(now() + 600000), uuid(), 'Delayed message');
10
11// Periodically checking for messages to process
12SELECT * FROM delayed_messages WHERE target_time <= now();

3. Storm with Redis

Redis, known primarily as an in-memory data structure store, offers features like sorted sets which can be used to implement delayed queues.

Implementation Method:

Use the ZADD command to add messages into a sorted set where the score is the timestamp when the message should be processed. A periodic job can then check this sorted set for scores less than the current time and process these messages.

Example:

lua

1-- Adding a delayed message
2redis.call('zadd', 'delayed-queue', '1582123269', 'message-id-123')
3
4-- Processing messages
5local messages = redis.call('zrangebyscore', 'delayed-queue', 0, os.time())
6for _, messageId in ipairs(messages) do
7    -- process message
8    redis.call('zrem', 'delayed-queue', messageId)
9end

4. Storm with Beanstalk

Beanstalk is a simple and fast work queue service. It supports delayed jobs natively, making it particularly straightforward to use for implementing a delayed queue.

Implementation Method:

When adding a job to Beanstalk, simply specify the delay time. Beanstalk will not make the job ready for consumption until after the delay period.

Example:

bash

1# Putting a job with a delay
2echo "put 0 120 60 15" > /dev/tcp/localhost/11300
3echo "hello delayed world" > /dev/tcp/localhost/11300
4
5# The consumer then processes jobs as they become ready.

Comparison Table

Technology	Native Support	Ease of Use	Scalability	Persistence
Kafka	No	Medium	High	Yes
Cassandra	No	Medium	High	Yes
Redis	No	Easy	Medium	Configurable
Beanstalk	Yes	Easy	Low	No

In conclusion, choosing the right technology for implementing a delayed queue in a Storm-based application depends largely on the specific requirements of the deployment scenario including factors like existing technological stack, durability requirements, and the scale of data. Kafka and Cassandra are generally best for large-scale, durable setups, Redis for fast, less durable setups, and Beanstalk for simple, smaller-scale applications.