Kafka with python How to send topic to postgreSQL?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
To move Kafka messages into PostgreSQL with Python, you usually consume records from a topic, transform them into database rows, and write them with a PostgreSQL driver. The main design choice is whether you truly need custom Python code or whether Kafka Connect with a JDBC sink would be a simpler production architecture.
The Basic Flow
The Python approach has three steps:
- consume a record from Kafka
- decode and validate the message
- insert or upsert it into PostgreSQL
That sounds simple, but the real engineering questions are about offset handling, retries, duplicates, and transaction boundaries. If you ignore those, the pipeline works in demos and fails under load.
A Simple Consumer-to-PostgreSQL Example
The example below uses confluent-kafka for Kafka and psycopg for PostgreSQL.
This example keeps the logic explicit: write to the database first, then commit the Kafka offset.
Why Offset Handling Matters
If you commit the Kafka offset before the database write succeeds, you can lose data. If you write to PostgreSQL and crash before the Kafka offset commit, you may reprocess a message after restart.
That is why idempotent database writes are important. The ON CONFLICT clause in the example above makes the insert safe to retry for the same order_id.
A good baseline pattern is:
- make the database write idempotent
- commit the Kafka offset only after the database transaction succeeds
That gives you at-least-once delivery with manageable duplication semantics.
Use Kafka Connect When Python Adds No Business Logic
If your job is simply “copy topic records into PostgreSQL,” Kafka Connect with the JDBC sink connector is often a better operational choice. It gives you managed offset handling, retries, parallelism, and connector configuration without custom consumer code.
Use custom Python when:
- you need custom transformation logic
- you enrich records from other services
- you apply domain-specific validation or routing
- you want the consumer inside an existing Python system
Use Kafka Connect when the pipeline is mostly transport.
Batch Writes Improve Throughput
The example above writes one row at a time for clarity. In production, batching can reduce database overhead significantly.
A common refinement is to buffer several Kafka messages, insert them in one transaction, then commit the highest processed offset. That improves throughput but makes failure handling more complex, so build the simple correct version first.
Serialization Choices
Most real Kafka topics do not contain arbitrary raw strings. They carry JSON, Avro, Protobuf, or another structured format. Your Python consumer should decode the payload intentionally and reject malformed messages instead of inserting garbage into PostgreSQL.
This is where schema validation pays off. A database table should not become the first place you discover that upstream data changed shape unexpectedly.
Common Pitfalls
A common mistake is enabling automatic Kafka commits while assuming the database write is the source of truth. That can lose records if the process crashes after the commit but before the insert.
Another mistake is writing non-idempotent inserts without a unique key or upsert strategy. Reprocessing then creates duplicates.
Developers also often build custom Python consumers when Kafka Connect would have handled the use case more simply and reliably.
Finally, do not ignore backpressure. If PostgreSQL slows down, the Kafka consumer design must cope with that instead of consuming blindly and falling behind unpredictably.
Summary
- Consume the Kafka topic, decode the message, and write it to PostgreSQL in that order.
- Commit Kafka offsets only after the database transaction succeeds.
- Make database writes idempotent so retries are safe.
- Use custom Python only when you need transformation or business logic.
- For pure topic-to-table movement, Kafka Connect with JDBC sink is often the better production solution.

