Kafka with python How to send topic to postgreSQL?

Kafka

Python

PostgreSQL

Data Streaming

Topic Messaging

Kafka with python How to send topic to postgreSQL?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

To move Kafka messages into PostgreSQL with Python, you usually consume records from a topic, transform them into database rows, and write them with a PostgreSQL driver. The main design choice is whether you truly need custom Python code or whether Kafka Connect with a JDBC sink would be a simpler production architecture.

The Basic Flow

The Python approach has three steps:

consume a record from Kafka
decode and validate the message
insert or upsert it into PostgreSQL

That sounds simple, but the real engineering questions are about offset handling, retries, duplicates, and transaction boundaries. If you ignore those, the pipeline works in demos and fails under load.

A Simple Consumer-to-PostgreSQL Example

The example below uses confluent-kafka for Kafka and psycopg for PostgreSQL.

python

1import json
2from confluent_kafka import Consumer
3import psycopg
4
5consumer = Consumer({
6    "bootstrap.servers": "localhost:9092",
7    "group.id": "pg-writer",
8    "auto.offset.reset": "earliest",
9    "enable.auto.commit": False,
10})
11consumer.subscribe(["orders"])
12
13conn = psycopg.connect("dbname=demo user=postgres password=secret host=localhost")
14
15with conn.cursor() as cur:
16    cur.execute(
17        """
18        create table if not exists orders (
19            order_id text primary key,
20            customer_name text not null,
21            amount numeric not null
22        )
23        """
24    )
25conn.commit()
26
27try:
28    while True:
29        msg = consumer.poll(1.0)
30        if msg is None:
31            continue
32        if msg.error():
33            raise RuntimeError(msg.error())
34
35        payload = json.loads(msg.value().decode("utf-8"))
36
37        with conn.cursor() as cur:
38            cur.execute(
39                """
40                insert into orders (order_id, customer_name, amount)
41                values (%s, %s, %s)
42                on conflict (order_id)
43                do update set customer_name = excluded.customer_name,
44                              amount = excluded.amount
45                """,
46                (payload["order_id"], payload["customer_name"], payload["amount"]),
47            )
48        conn.commit()
49        consumer.commit(msg)
50finally:
51    consumer.close()
52    conn.close()

This example keeps the logic explicit: write to the database first, then commit the Kafka offset.

Why Offset Handling Matters

If you commit the Kafka offset before the database write succeeds, you can lose data. If you write to PostgreSQL and crash before the Kafka offset commit, you may reprocess a message after restart.

That is why idempotent database writes are important. The ON CONFLICT clause in the example above makes the insert safe to retry for the same order_id.

A good baseline pattern is:

make the database write idempotent
commit the Kafka offset only after the database transaction succeeds

That gives you at-least-once delivery with manageable duplication semantics.

Use Kafka Connect When Python Adds No Business Logic

If your job is simply “copy topic records into PostgreSQL,” Kafka Connect with the JDBC sink connector is often a better operational choice. It gives you managed offset handling, retries, parallelism, and connector configuration without custom consumer code.

Use custom Python when:

you need custom transformation logic
you enrich records from other services
you apply domain-specific validation or routing
you want the consumer inside an existing Python system

Use Kafka Connect when the pipeline is mostly transport.

Batch Writes Improve Throughput

The example above writes one row at a time for clarity. In production, batching can reduce database overhead significantly.

A common refinement is to buffer several Kafka messages, insert them in one transaction, then commit the highest processed offset. That improves throughput but makes failure handling more complex, so build the simple correct version first.

Serialization Choices

Most real Kafka topics do not contain arbitrary raw strings. They carry JSON, Avro, Protobuf, or another structured format. Your Python consumer should decode the payload intentionally and reject malformed messages instead of inserting garbage into PostgreSQL.

This is where schema validation pays off. A database table should not become the first place you discover that upstream data changed shape unexpectedly.

Common Pitfalls

A common mistake is enabling automatic Kafka commits while assuming the database write is the source of truth. That can lose records if the process crashes after the commit but before the insert.

Another mistake is writing non-idempotent inserts without a unique key or upsert strategy. Reprocessing then creates duplicates.

Developers also often build custom Python consumers when Kafka Connect would have handled the use case more simply and reliably.

Finally, do not ignore backpressure. If PostgreSQL slows down, the Kafka consumer design must cope with that instead of consuming blindly and falling behind unpredictably.

Summary

Consume the Kafka topic, decode the message, and write it to PostgreSQL in that order.
Commit Kafka offsets only after the database transaction succeeds.
Make database writes idempotent so retries are safe.
Use custom Python only when you need transformation or business logic.
For pure topic-to-table movement, Kafka Connect with JDBC sink is often the better production solution.