Change-data-capture from Postgres SQL to kafka topics using standalone mode Kafka-connect
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Change Data Capture (CDC) is a design pattern that monitors and records the changes in data so that other software can respond to those changes. In the world of databases such as PostgreSQL, CDC can be used to capture insertions, updates, and deletions, and then propagate these changes to other systems like Apache Kafka for further processing, analytics, or data integration.
Using Kafka Connect for Postgres CDC
To capture changes from a PostgreSQL database and stream them to Kafka topics, Kafka Connect with Debezium connectors makes a robust solution. Kafka Connect is an open-source component of Apache Kafka that provides scalable and reliable way to move data between Kafka and other data systems in real or near-real time.
Standalone Mode Kafka Connect
Kafka Connect can be configured in two modes: standalone and distributed. Standalone mode runs Kafka Connect in a single process, whereas distributed mode runs it across multiple machines. For development or small scale environments, standalone mode is typically sufficient.
Setting Up Kafka Connect with Debezium for PostgreSQL
Here’s a step-by-step guide to set up CDC from PostgreSQL to Kafka topics using Kafka Connect in standalone mode:
1. Environment Preparation
Have a PostgreSQL server running with database changes that need to be captured, and an Apache Kafka broker along with Zookeeper setup.
2. Install and Configure Kafka Connect
- Install Kafka (if not already installed) which includes Kafka Connect.
- Add the Debezium PostgreSQL connector plugin to the Kafka Connect plugins directory.
3. Configure PostgreSQL
Enable logical replication:
- Set
wal_leveltological. - Ensure
max_replication_slotsandmax_wal_sendersare set to a reasonable value.
4. Create a Connector Configuration
Create a JSON file (postgres-connector.json) that defines the Debezium PostgreSQL connector configuration:
5. Start Kafka Connect in Standalone Mode
Navigate to the Kafka Connect installation directory and start Kafka Connect with the connector configuration file:
Monitoring and Managing Connectors
While Kafka Connect is running, you can manage and monitor the connectors using the Kafka Connect REST API. For example, to check the status of a connector:
Summary Table of Key Components and Configurations
| Component | Description |
| PostgreSQL | Source RDBMS where data changes happen. |
| Kafka Connect | Tool for reliably streaming data between Apache Kafka and other systems. |
| Debezium Connector | Kafka Connect plugin to capture data change events from PostgreSQL. |
| Kafka Topics | Destination in Kafka where change events are stored. |
| Zookeeper | Coordination service for Kafka brokers and Kafka Connect. |
Conclusion
Using Kafka Connect in standalone mode with a Debezium PostgreSQL connector is an effective way to perform CDC, streaming database changes into Kafka for real-time data processing and monitoring. This guide provides a comprehensive overview to get started with setting up CDC from PostgreSQL using Kafka Connect.
With this architecture, businesses can leverage real-time data streaming to enable more agile data management and analysis, enhancing operational flexibility and insight.

