This is the documentation for CDH 5.1.x.
Documentation for other versions is available at Cloudera Documentation.

Configuring High Availability for Llama

Llama High Availability (HA) uses an Active/Standby architecture, in which the active Llama is automatically elected using the ZooKeeper-based ActiveStandbyElector. The active Llama accepts RPC/Thrift connections and communicates with YARN. The standby Llama monitors the leader information in ZooKeeper, but doesn't accept RPC/Thrift connections.

Fencing

Only one of the Llamas should be active to ensure the resources are not partitioned. Llama uses ZooKeeper Access Control Lists (ACLs) to claim exclusive ownership of the cluster when transitioning to active, and monitors this ownership periodically. If another Llama takes over, the first one realizes it within this period.

Reclaiming Cluster Resources

To claim resources from YARN, Llama spawns YARN applications and runs unmanaged ApplicationMasters. When a Llama goes down, the resources allocated to all the YARN applications spawned by it are not reclaimed until YARN times out those applications (default timeout is 10 minutes). On Llama failure, these resources are reclaimed by means of a Llama that kills any YARN applications spawned by this pair of Llamas.

Configuring HA

Configure Llama HA by modifying the following configuration properties in /etc/llama/conf/llama-site.xml. There is no need for any additional daemons.

Property Description Default Recommended
llama.am.cluster.id Cluster ID of the Llama pair, used to differentiate between different Llamas llama [cluster-specific]
llama.am.ha.enabled* Whether to enable Llama HA false true
llama.am.ha.zk-quorum* ZooKeeper quorum to use for leader election and fencing [cluster-specific]
llama.am.ha.zk-base Base znode for leader election and fencing data /llama [cluster-specific]
llama.am.ha.zk-timeout-ms The session timeout, in milliseconds, for connections to ZooKeeper quorum 10000 10000
llama.am.ha.zk-acl ACLs to control access to ZooKeeper world:anyone:rwcda [cluster-specific]
llama.am.ha.zk-auth Authorization information to go with the ACLs [cluster-acl-specific]

*Required configurations