Skip to content

Zookeeper Architecture In-Depth

The core Zookeeper distributed service is called an ensemble. Each ensemble consists of an odd number of Zookeeper servers for resiliency purposes. Having an odd number of servers ensures that majorities can be formed to elect new leaders if needed and prevents split-brain scenarios.

Visualizing the Ensemble

Here is a visual diagram of a simple Zookeeper ensemble consisting of 3 servers forming a quorum. The servers maintain an in-memory database that represents the coordinated state managed by Zookeeper, and a majority of servers must be available for the ensemble to function.

Zookeeper Ensemble Diagram

Some key characteristics of the ensemble:

  • Servers communicate using atomic broadcast protocols to stay in sync
  • Leader server handles all write requests and synchronization
  • Followers service read requests and forward writes to leader
  • Observers passively mirror state without impacting quorum

Observers for Added Reliability

Zookeeper supports observer nodes that help enhance reliability and data safety without impacting the quorum requirements. Observers mirror the cluster state similar to followers but do not participate in quorum voting.

Key observer capabilities:

  • Help ensure writes are replicated to multiple servers
  • Maintain up-to-date data timeline even if ensemble offline
  • Require less server resources than full voting members

Availability and Consistency

Zookeeper guarantees sequential consistency for client operations. It ensures all requests from a client will execute in proper order and newly connected clients will see the latest updated state.

Zookeeper prioritizes consistency over availability in failure scenarios. It uses quorum protocols so even a minority of servers can fail and the ensemble will still operate, but allows for some downtime if a majority fail.

This consistency/availability tradeoff aligns with the coordination requirements for systems built atop Zookeeper. Maintaining correct synchronized state across the cluster is critical.

Earlier we covered that znodes represent the fundamental data objects in Zookeeper. Here are some deeper technical details on znodes:

Znode Stat Structure

Each znode contains metadata known as the stat structure:

cZxid - Created zxid 
ctime - Created timestamp
mZxid - Last modified zxid
mtime - Last modified timestamp  
pZxid - Last modified children zxid
cversion - Child version number 
dataVersion - Znode version number
aclVersion - ACL version number
ephemeralOwner - Owner id if ephemeral 
dataLength- Length of znode data
numChildren - Number of children

These data fields enable properly tracking modifications and ensuring consistency across Zookeeper servers when changes occur. The version numbers play a key role.

Znode Creation Example

Here is sample Java code for creating a persistent znode:

String path = "/app1";
String data = "config=foo";
ArrayList<ACL> acls = ZooDefs.Ids.OPEN_ACL_UNSAFE;

zk.create(path, data.getBytes(), acls, CreateMode.PERSISTENT); 

Key things to observe:

  • Path defines znode hierarchical location
  • Actual app data being coordinated
  • ACL set to open permissions
  • Persistent mode to durably store

Znode Types Comparison

Type Description Durability Use Cases
Persistent Survive session termination High Configuration, status data
Ephemeral Tied to client session Low Locks, presence
Sequential Ordered by sequence number Depends Queuing, leader election

As you can see, different znode types enable covering a variety of coordination use cases.

In addition to synchronization and naming capabilities, Zookeeper also powers service discovery. Some key capabilities:

  • Services register ephemeral znodes to represent availability
  • Clients watch service znodes and get notified of arrival/departure
  • Load balancing by registering multiple znode instances with data like host/port

For example, a replicated database service cluster could leverage Zookeeper for auto-discovery of active DB servers by clients using watches. The ensemble abstracts all the complexity of tracking membership.

Integrating with Kubernetes

Given versatile capabilities like naming and configuration management, Zookeeper integrates well with infrastructure like Kubernetes:

  • Cluster operators store configs in Zookeeper znodes
  • Pods access configs and get dynamic updates
  • Naming abstraction prevents direct pod coupling

By handling coordination tasks like leader election, Zookeeper simplifies building distributed control plane services on Kubernetes itself as well.

When leveraging Zookeeper‘s hierarchical data model, here are some best practices for organizing znodes:

  • Start broad: Have a root parent node like /myapp for owning entire namespace
  • Modular layout: Logically group znodes into sub-hierarchies like /users, /configs
  • Ephemeral parents: Use ephemeral parents for child groups that come and go
  • Sequence sparingly: Avoid overuse of sequence numbers for readability

Smart znode hierarchy design facilitates simpler and more reliable coordination workflows.

In addition to rich capabilities, Zookeeper is optimized for high throughput and low latency operations critical for powering massive-scale distributed systems. Here some key published benchmark stats on Zookeeper performance from Pantheon:

Zookeeper Performance Benchmark

As you can see, Zookeeper delivers excellent performance across different operations even at high throughputs. The numbers highlight why companies trust Zookeeper for highly demanding use cases.

Capacity Planning

When planning out Zookeeper clusters, ensure sufficient capacity by budgeting for your request loads and tolerable latency targets. General guidance for capacity:

  • Ensemble servers: Odd number between 3-7 based on load
  • Hardware: Modern multicore servers with SSD storage
  • Clients: Target <1,000 connections per server
  • Writes: Majority have higher cost than reads

Right-size your Zookeeper ensemble using projected volumes and performance needs.

There are other coordination and configuration management tools with some overlap to Zookeeper capabilities. For example, etcd and Consul solve some similar use cases like service discovery. Here is a quick comparison between the options:

System Language Strong Points Weak Points
Zookepeer Java Reliability, ordering guarantees Operational complexity
etcd Go Simplicity, watch triggers Consensus protocol limitations
Consul Go Service mesh integration, ACLs External system dependencies

Evaluate technical tradeoffs like scale needs, failure scenarios, ordering requirements when deciding between these tools.

Apache Zookeeper provides a versatile set of battle-tested distributed coordination services like configuration, naming, synchronization, queues, and leader election. Its reliability focused design based on quorum protocols, ordering guarantees, and replication empower Zookeeper to handle extremely demanding use cases like managing Yahoo‘s production infrastructure spanning data centers. This in-depth tutorial covered key architectural concepts like ensembles, znodes, watches, and access control lists. We also explored varied practical examples like service discovery integration, best practices for organizing znode hierarchies, and published performance benchmarks showcasing Zookeeper response times and scalability. Whether you are building distributed systems that require coordination across services, or deploying complex infrastructure with many moving parts, Zookeeper’s capabilities make it an essential tool for enabling robustness.

Read More Topics