apache- kafka with 100 millions of topics
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a distributed event streaming platform that enables its users to publish, subscribe to, store, and process streams of records in real time. It's widely recognized for its high throughput, scalability, and fault tolerance. One of the core components in Kafka is the notion of "topics," through which records are categorized. Each topic is divided into partitions, where each partition is an ordered, immutable sequence of records.
Scalability Challenges with 100 Million Topics
Managing 100 million topics in Apache Kafka is a formidable challenge that raises significant concerns in terms of scalability, performance, and management. Each topic in Kafka, regardless of the number of messages it holds, requires certain overhead. This overhead is in terms of memory and also in terms of the file descriptors that are held open by Kafka processes. As the number of topics grows significantly, these costs can balloon, thereby impacting the performance.
Technical Implications
- Metadata Storage: Topics and partitions are managed through metadata which is stored in Zookeeper (though recent Kafka versions are moving towards removing the Zookeeper dependency). With millions of topics, the size of metadata becomes substantial, affecting Zookeeper performance.
- Broker Memory Usage: Each topic and partition consumes memory on the broker. With 100 million topics, the memory requirement can exceed practical limits.
- Client Connection Overhead: More topics mean more connections for consumers and producers, potentially leading to network and CPU overhead on Kafka brokers.
Design Considerations
To effectively manage an extremely high number of topics in Apache Kafka, careful planning and optimization of the setup are required:
Topic Consolidation
Instead of creating numerous small topics, organize related streams into fewer topics with more partitions. This reduces the load on Kafka's management layer and can aid performance.
Kafka Configuration
Adjusting Kafka settings can also mitigate issues:
- Increase
num.network.threadsandnum.io.threadsto improve network and I/O performance. - Adjust
socket.request.max.bytesandmessage.max.bytesto optimize the data throughput and size of messages that Kafka can handle.
Effective Hardware Utilization
Deploying Kafka on high-spec servers with ample memory and fast SSDs can mitigate the storage and memory overhead issues.
Use of Compact Topics
Compact topics (using log compaction feature) reduce the storage needs by retaining only the last message for each key in a partition. This feature is particularly useful in configurations with an extremely large number of sparse topics.
Summary Table
| Parameter | Description | Impact with 100 Million Topics |
| Metadata Overhead | Storage and CPU overhead in handling topic metadata | Very High |
| Broker Memory Usage | Each topic/partition uses memory on the broker | Extremely High |
| Client Connections | Increase in clients connecting to millions of topics | High |
| Topic Consolidation | Reduction of total topics by combining related streams | Reduce overhead |
| System Resources (Hardware) | Requirement for high-end server specifications | Critical |
| Kafka Configuration Optimizations | Tuning Kafka to handle high loads | Essential |
Technical Example: Topic Optimization
Here’s how you might approach the consolidation of topics in a practical scenario:
In the provided example, rather than creating separate topics for cars, trucks, etc., all data is sent to a single topic with keys distinguishing the vehicle types.
Conclusion
Handling 100 million topics in Kafka is highly impractical without significant optimizations and careful infrastructure planning. Consolidating topics, optimizing configurations, and ensuring robust hardware setup are vital steps in managing such a large scale in Kafka environments. With a disciplined approach to design and deployment, Kafka can be scaled to handle very high loads, but the complexity and overhead management become critical factors in such scenarios.

