Azure Event Hubs limits and its comparison to pure Kafka cluster
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Microsoft Azure Event Hubs is a highly scalable data streaming platform and event ingestion service, capable of receiving and processing millions of events per second. Event Hubs can process and store events, data, or telemetry produced by distributed software and devices. Under the hood, Azure Event Hubs is often compared to Apache Kafka, which is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. This comparison is quite relevant as Azure Event Hubs provides a Kafka endpoint that can be used to mirror Kafka applications into Azure using the Kafka protocol on Azure Event Hubs directly. To help understand how Azure Event Hubs compares with a pure Kafka cluster, it's important to discuss their capabilities, configuration limits, scalability, and typical use cases.
Technical Limits and Scaling
Azure Event Hubs
- Throughput units: Event Hubs uses a concept called throughput units (TUs) for scaling, where each throughput unit guarantees a certain capacity of incoming and outgoing data (1 MB/s ingress and 2 MB/s egress per TU). Users can scale out by adding more TUs or employ auto-inflate feature to adjust TUs dynamically based on usage.
- Partition Count: Event Hubs supports up to 32 partitions per Event Hub in the Standard tier, and up to 2000 partitions per Event Hub in the Dedicated tier. Partitions are static and their number can't change once the Event Hub is created.
- Event Retention: Data retention policies are also configurable with a maximum of 7 days in the Standard tier and up to 90 days in the Dedicated Tier.
Apache Kafka
- Brokers and Clusters: Kafka’s scalability relies on partitions distributed across a cluster of broker servers. Scalability is managed by adding more brokers or increasing the number of partitions for a topic.
- Partition Limits: Kafka can support thousands of partitions per topic, and the limit mainly depends on the capability of the Kafka cluster's hardware and network.
- Data Retention: Kafka provides configurable retention policies based on time, size, or both, and there's effectively no upper limit provided there’s sufficient storage.
Use Case Suitability
- Azure Event Hubs: Best suited for cloud-based streaming applications, especially where integration with other Azure services is beneficial (e.g., Azure Functions, Azure Stream Analytics).
- Apache Kafka: Ideal for heavy-duty, large-scale streaming applications where fine-grained control over the environment is required.
Performance Considerations
While both platforms are capable of handling high-throughput data streams, the managed nature of Event Hubs simplifies operations at the cost of some amount of customization and control that a completely self-managed Kafka cluster would offer.
Feature Set Integration and Ecosystem
- Azure Event Hubs integrates natively with many Azure services and offers features like Capture, which automatically saves the streamed data to Azure Blob storage or Azure Data Lake.
- Apache Kafka has a large ecosystem including connectors enabling integration with numerous systems, extensive client library support, and strong community backing.
Pricing Model
- Azure Event Hubs features a tier-based pricing model with costs depending on the number of throughput units, data retention, and additional features.
- Apache Kafka is free as an open-source system, but implementing it in a cloud environment (e.g., on Azure using HDInsight) introduces costs based on resource usage such as VMs, storage, and networking.
Example Configurations and Commands
Creating an Event Hub:
Creating a Kafka topic:
Summary Table
| Feature | Azure Event Hubs | Apache Kafka |
| Scaling Unit | Throughput Units (TUs) | Broker, topics and partitions |
| Max Partitions | 32 Standard, 2000 Dedicated | No fixed limit |
| Data Retention | Up to 90 days (Dedicated) | Configurable, no upper limit |
| Pricing | Tier-based+Feature costs | Resource usage |
| Management | Fully managed | Self-managed or managed via cloud provider |
| Integration with Azure | Native | Via connectors and additional setup |
| Protocol Support | AMQP, HTTP, Kafka Protocol | Native Kafka protocol |
Both Azure Event Hubs and Apache Kafka are powerful platforms for handling real-time data and stream processing workloads, with their own strengths and ideal scenarios of application. The choice between them should be guided by specific project requirements, existing infrastructure, and particular scalability or integration needs.

