How to connect Kafka with Elasticsearch?

Kafka

Elasticsearch

Data Management

Data Integration

Technology Tutorial

How to connect Kafka with Elasticsearch?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

The most common way to connect Kafka to Elasticsearch is to use Kafka Connect with an Elasticsearch sink connector. That setup lets Kafka remain the durable event log while Elasticsearch becomes the searchable projection layer for indexing and analytics.

Use Kafka Connect as the Integration Layer

Trying to write directly from every producer into Elasticsearch usually creates duplicate delivery logic, retry complexity, and operational sprawl. Kafka Connect centralizes that work in one managed path.

A basic connector configuration looks like this:

json

1{
2  "name": "orders-to-es",
3  "config": {
4    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
5    "tasks.max": "1",
6    "topics": "orders",
7    "connection.url": "http://elasticsearch:9200",
8    "key.ignore": "false",
9    "schema.ignore": "true",
10    "type.name": "_doc"
11  }
12}

The connector reads messages from the orders topic and writes them into Elasticsearch. In practice, key.ignore is an important decision because it affects document identity and update behavior.

Stand Up the Pieces in a Predictable Order

At minimum, you need:

a running Kafka broker
a running Elasticsearch cluster
a Kafka Connect worker with the Elasticsearch sink plugin installed

A common local workflow is:

bash

1# start Elasticsearch first
2bin/elasticsearch
3
4# then start Kafka broker and Connect worker
5bin/kafka-server-start.sh config/server.properties
6bin/connect-standalone.sh config/connect-standalone.properties config/elasticsearch-sink.properties

After that, produce a test record:

bash

kafka-console-producer.sh --bootstrap-server localhost:9092 --topic orders
>{"orderId":"o-1","status":"created","total":19.95}

Then verify the document landed in Elasticsearch:

bash

curl -s 'http://localhost:9200/orders/_search?pretty'

Think About Document Keys Early

If Kafka messages represent updates to the same logical entity, stable document ids matter. Otherwise, every event can become a new Elasticsearch document even when you intended an upsert.

For example, if the record key is the business id, keep it:

json

1{
2  "key.ignore": "false",
3  "write.method": "upsert"
4}

That lets later events update the same Elasticsearch document instead of creating duplicates. If you ignore keys, search results often look inflated because every change event becomes another indexed row.

Map and Transform the Data Carefully

Elasticsearch cares about field types, and Kafka events are not always shaped for search as-is. Single Message Transforms can help normalize the event before indexing.

Example transform section:

json

1{
2  "transforms": "extract",
3  "transforms.extract.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
4  "transforms.extract.field": "payload"
5}

This is useful when the message envelope contains metadata plus a nested payload and you only want the payload indexed.

You should also decide whether the index mapping is controlled manually or inferred dynamically. Dynamic mapping is convenient at first, but it can create messy field types if event shapes drift.

Add Operational Guardrails

A working connector is not the same as a production-ready connector. Add guardrails early:

dead-letter queue for malformed records
connector metrics and lag monitoring
explicit index templates or mappings
retry policy compatible with your failure model

For example, malformed JSON should not silently disappear. It should go to a dead-letter topic or fail loudly enough that operators can see it.

Common Pitfalls

Writing directly from producers to Elasticsearch instead of using Kafka Connect creates duplicated integration logic.
Ignoring Kafka keys often causes duplicate documents instead of deterministic updates.
Letting Elasticsearch dynamic mapping decide everything can produce unstable index schemas.
Skipping dead-letter handling makes bad records much harder to diagnose.
Assuming Kafka ordering guarantees automatically translate into Elasticsearch query semantics is usually wrong once multiple partitions are involved.

Summary

Kafka Connect is the standard way to connect Kafka topics to Elasticsearch.
Configure the Elasticsearch sink connector with clear decisions about keys and schemas.
Verify end-to-end flow with a test event and an Elasticsearch query.
Treat document identity and index mapping as first-class design choices.
Add operational controls such as DLQ handling and monitoring before calling the pipeline done.