ZooKeeper

This document describes the reasoning behind Ensemble’s use of ZooKeeper, and also the structure and semantics used by Ensemble in the ZooKeeper filesystem.

Ensemble & ZooKeeper

ZooKeeper offers a virtual filesystem with so called znodes (we’ll refer to them simply as nodes in this document). The state stored in the filesystem is fully introspectable and observable, and the changes performed on it are atomic and globally ordered. These features are used by Ensemble to maintain its distributed runtime state in a reliable and fault tolerant fashion.

When some part of Ensemble wants to modify the runtime state anyhow, rather than enqueuing a message to a specific agent, it should instead perform the modification in the ZooKeeper representation of the state, and the agents responsible for enforcing the requested modification should be watching the given nodes, so that they can realize the changes performed.

When compared to traditional message queueing, this kind of behavior enables easier global analysis, fault tolerance (through redundant agents which watch the same states), introspection, and so on.

Filesystem Organization

The semantics and structures of all nodes used by Ensemble in its ZooKeeper filesystem usage are described below. Each entry here maps to a node, and the semantics of the given node are described right below it.

Note that, unlike a traditional filesystem, nodes in ZooKeeper may hold data, while still being a parent of other nodes. In some cases, information is stored as content for the node itself, in YAML format. These are noted in the tree below under a bulleted list and italics. In other cases, data is stored inside a child node, noted in the tree below as indented /bold. The decision around whether to use a child node or content in the parent node revolves around use cases.

/topology
Describes the current topology of machines, services, and service units. Nodes under /machines, /services, and /units, should not be considered as valid unless they are described in this file. The precise format of this file is an implementation detail.
/formulas

Each formula used in this environment must have one entry inside this node.

Readable by:Everyone
/<namespace>:<name>-<revision>

Represents a formula available in this environment. The node name includes the formula namespace (ubuntu, ~user, etc), the formula name, and the formula revision.

  • sha256: This option contains the sha256 of a file in the file storage, which contains the formula bundle itself.
  • metadata: Contains the metadata for the formula itself.
  • schema: The settings accepted by this formula. The precise details of this are still unspecified.
/services

Each formula to be deployed must be included under an entry in this tree.

Readable by:Everyone
/service-<0..N>

Node with details about the configuration for one formula, which can be used to deploy one or more formula instances for this specific formula.

  • formula: The formula to be deployed. The value of this option should be the name of a child node under the /formulas parent.
/settings

Options for the formula provided by the user, stored internally in YAML format.

Readable by:Formula Agent
Writable by:Admin tools
/units

Each node under this parent reflects an actual service agent which should be running to manage a formula.

/unit-<0..N>

One running service.

Readable by:Formula Agent
Writable by:Formula Agent
/machine
Contains the internal machine id this service is assigned to.
/formula-agent-connected
Ephemeral node which exists when a formula agent is handling this instance.

/machines

/machine-<0..N>

/provisioning-lock
The Machine Provisioning Agent
/machine-agent-connected
Ephemeral node created when the Machine Agent is connected.
/info

Basic information about this machine.

  • public-dns-name: The public DNS name of this machine.
  • machine-provider-id: ie. EC2 instance id.

Provisioning a new machine

When the need for a new machine is determined, the following sequence of events happen inside the ZooKeeper filesystem to deploy the new machine:

  1. A new node is created at /machines/instances/<N>.
  2. Machine Provisioning Agent has a watcher on /machines/instances/, and gets notified about the new node.
  3. Agent acquires a provisioning lock at /machines/instances/<N>/provisioning-lock
  4. Agent checks if the machine still has to be provisioned by verifying if /machines/instances/<N>/info exists.
  5. If the machine has provider launch information, than the agent schedules to come back to the machine after <MachineBootstrapMaxTime>.
  6. If not, the agent fires the machine via the provider and stores the provider launch info (ie. EC2 machine id, etc.) and schedules the to come back to the machine after <MachineBootstrapMaxTime>.
  7. As a result of a schedule call the machine provider verifies the existence of a /machines/instance/<N>/machine-agent-connected node and if it does sets a watch on it.
  8. If the agent node doesn’t exist after the <MachineBootstrapMaxTime> then the agent acquires the /machines/instances/<N>/provisioning-lock, terminates the instance, and goes to step 6.

Bootstrap Notes

This verification of the connected machine agent helps us guard against any transient errors that may exist on a given virtual node due to provider vagaries.

When a machine provisioning agent comes up, it must scan the entire instance tree to verify all nodes are running. We need to keep some state to distinguish a node that has never come up from a node that has had its machine agent connection die so that a new provisioning agent can distinguish between a new machine bootstrap failure and an running machine failure.

use a one time password (otp) via user data to guard the machine agent permanent principal credentials.

TODO... we should track a counter to keep track of how many times we’ve attempt to launch a single instance.

Connecting a Machine

When a machine is launched, we utilize cloud-init to install the requisite packages to run a machine agent (libzookeeper, twisted) and launch the machine agent.

The machine agent reads its one time password from ec2 user-data and connects to zookeeper and reads its permanent principal info and role information which it adds to its connection.

The machine agent reads and sets a watch on /machines/instances/<N>/services/. When a service is placed there the agent resolve its formula, downloads the formula, creates an lxc container, and launches a formula agent within the container passing the formula path.

Starting a Formula

The formula agent connects to zookeeper using principal information provided by the machine agent. The formula agent reads the formula metadata, and installs any package dependencies, and then starts invoking formula hooks.

The formula agent creates the ephemeral node /services/<service name>/instances/<N>/formula-agent-connected.

The formula is running when....

Project Versions

Table Of Contents

Previous topic

Unit Agent startup

Next topic

Drafts

This Page