How does etcd propagate writes to non-leader members?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
etcd is a distributed key-value store that provides a reliable way to store data across a cluster of machines. It's primarily used in Kubernetes for holding and managing the critical information that distributes and schedules workloads. etcd uses the Raft consensus algorithm to manage a highly-available replicated log.
Understanding Raft and its Role in etcd
Raft is a consensus algorithm designed as an alternative to Paxos. It provides a similar guarantee of log consistency and fault tolerance but aims to be simpler to understand and implement. In Raft, all changes to the replicated log are driven through the leader node, which is elected from among the cluster's nodes.
How etcd Propagates Writes to Non-Leader Members
When a client writes data to etcd, the process typically goes through several stages:
- Client Interaction with the Leader: The client sends a request to write data to the leader of the etcd cluster. The leader node is responsible for managing the log replication across the cluster.
- Log Replication: Upon receiving a write request, the leader appends the entry to its local log. It then proceeds to replicate this log entry to the other members of the cluster (the followers).
- Consistency and Confirmation: Each follower appends the new entry to their local log and sends an acknowledgment back to the leader. Once the leader node receives acknowledgments from a majority of the nodes (i.e., it achieves a quorum), the entry is considered committed.
- Applying the Entry: The leader applies the entry to its current state machine and informs the followers to do the same. This step updates the actual state on all cluster nodes based on the committed log entries.
- Client Acknowledgment: After the entry is committed and applied across the quorum of nodes, the leader sends a response back to the client confirming that the write has been successfully replicated and stored.
Failover Handling
In case the leader node fails, Raft handles the situation by electing a new leader. During the time it takes to elect a new leader, the etcd cluster is unavailable for writes due to the loss of consensus. Once a new leader is elected, it resumes accepting write requests and log replication continues as normal.
Technical Example
Here’s a simplified example to illustrate a write operation in an etcd cluster:
Summary Table
| Stage | Description |
| Client Interaction | Client sends write request to the cluster leader. |
| Log Replication | Leader replicates the entry to follower nodes. |
| Consistency Check | Followers send acknowledgments back to the leader. |
| Entry Commit | Leader commits the entry after receiving a quorum of acknowledgments. |
| State Update | Committed entry is applied across the nodes. |
| Client Response | Client is informed about the successful write operation. |
Conclusions
etcd’s use of Raft ensures that every write operation is safely stored across a majority of nodes before it is considered committed, thus maintaining data integrity and availability even in the event of node failures. This design allows etcd to operate as a reliable, distributed system that is fundamental for managing stateful applications in distributed environments like Kubernetes.
Understanding these internal mechanisms can help administrators and developers build more reliable systems and troubleshoot issues related to data consistency and cluster performance more effectively.

