This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Configuring YARN Security

If you are using MRv1, skip this section and see Configuring MRv1 Security.

If you are using YARN, do the following steps to configure, start, and test secure YARN.

  1. Configure Secure YARN.
  2. Start up the ResourceManager.
  3. Start up the NodeManager.
  4. Start up the MapReduce Job History Server.
  5. Try Running a Map/Reduce YARN Job.

Step 1: Configure Secure YARN

Before you start:

  • The Kerberos principals for the ResourceManager and NodeManager are configured in the yarn-site.xml file. The same yarn-site.xml file must be installed on every host machine in the cluster.
  • Make sure that each user who will be running YARN jobs exists on all cluster nodes (that is, on every node that hosts any YARN daemon).

To configure secure YARN:

  1. Add the following properties to the yarn-site.xml file on every machine in the cluster:
    <!-- ResourceManager security configs -->
    <property>
      <name>yarn.resourcemanager.keytab</name>
      <value>/etc/hadoop/conf/yarn.keytab</value>	<!-- path to the YARN keytab -->
    </property>
    <property>
      <name>yarn.resourcemanager.principal</name>	
      <value>yarn/_HOST@YOUR-REALM.COM</value>
    </property>
    
    <!-- NodeManager security configs -->
    <property>
      <name>yarn.nodemanager.keytab</name>
      <value>/etc/hadoop/conf/yarn.keytab</value>	<!-- path to the YARN keytab -->
    </property>
    <property>
      <name>yarn.nodemanager.principal</name>	
      <value>yarn/_HOST@YOUR-REALM.COM</value>
    </property>	
    <property>
      <name>yarn.nodemanager.container-executor.class</name>	
      <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
    </property>	
    <property>
      <name>yarn.nodemanager.linux-container-executor.group</name>
      <value>yarn</value>
    </property>	
  2. Add the following properties to the mapred-site.xml file on every machine in the cluster:
    <!-- MapReduce Job History Server security configs -->
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>host:port</value> <!-- Host and port of the MapReduce Job History Server; default port is 10020  -->
    </property>
    <property>
      <name>mapreduce.jobhistory.keytab</name>
      <value>/etc/hadoop/conf/mapred.keytab</value>	<!-- path to the MAPRED keytab for the Job History Server -->
    </property>	
    <property>
      <name>mapreduce.jobhistory.principal</name>	
      <value>mapred/_HOST@YOUR-REALM.COM</value>
    </property>	
  3. Create a file called container-executor.cfg for the Linux Container Executor program that contains the following information:
    yarn.nodemanager.local-dirs=<comma-separated list of paths to local NodeManager directories. Should be same values specified in yarn-site.xml. Required to validate paths passed to container-executor in order.>
    yarn.nodemanager.linux-container-executor.group=yarn
    yarn.nodemanager.log-dirs=<comma-separated list of paths to local NodeManager log directories. Should be same values specified in yarn-site.xml. Required to set proper permissions on the log files so that they can be written to by the user's containers and read by the NodeManager for log aggregation.
    banned.users=hdfs,yarn,mapred,bin	
    min.user.id=1000
      Note:

    In the container-executor.cfg file, the default setting for the banned.users property is hdfs, yarn, mapred, and bin to prevent jobs from being submitted via those user accounts. The default setting for the min.user.id property is 1000 to prevent jobs from being submitted with a user ID less than 1000, which are conventionally Unix super users. Note that some operating systems such as CentOS 5 use a default value of 500 and above for user IDs, not 1000. If this is the case on your system, change the default setting for the min.user.id property to 500. If there are user accounts on your cluster that have a user ID less than the value specified for the min.user.id property, the NodeManager returns an error code of 255.

  4. The path to the container-executor.cfg file is determined relative to the location of the container-executor binary. Specifically, the path is <dirname of container-executor binary>/../etc/hadoop/container-executor.cfg. If you installed the CDH4 package, this path will always correspond to /etc/hadoop/conf/container-executor.cfg.
      Note:

    The container-executor program requires that the paths including and leading up to the directories specified in yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to be set to 755 permissions as shown in this table on permissions on directories.

  5. Verify that the ownership and permissions of the container-executor program corresponds to:
    ---Sr-s--- 1 root yarn 36264 May 20 15:30 container-executor
      Note:

    For more information about the Linux Container Executor program, see Appendix B - Information about Other Hadoop Security Programs.

Step 2: Start up the ResourceManager

You are now ready to start the ResourceManager.

  Note:

Make sure you always start ResourceManager before starting NodeManager.

If you're using the /etc/init.d/hadoop-yarn-resourcemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-resourcemanager start

You can verify that the ResourceManager is working properly by opening a web browser to http://host:8088/ where host is the name of the machine where the ResourceManager is running.

Step 3: Start up the NodeManager

You are now ready to start the NodeManager.

If you're using the /etc/init.d/hadoop-yarn-nodemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-nodemanager start

You can verify that the NodeManager is working properly by opening a web browser to http://host:8042/ where host is the name of the machine where the NodeManager is running.

Step 4: Start up the MapReduce Job History Server

You are now ready to start the MapReduce Job History Server.

If you're using the /etc/init.d/hadoop-mapreduce-historyserver script, then you can use the service command to run it now:

$ sudo service hadoop-mapreduce-historyserver start

You can verify that the MapReduce JobHistory Server is working properly by opening a web browser to http://host:19888/ where host is the name of the machine where the MapReduce JobHistory Server is running.

Step 5: Try Running a Map/Reduce YARN Job

You should now be able to run Map/Reduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples (/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar). Note that you will need Kerberos credentials to do so.

  Important:

Remember that the user who launches the job must exist on every node.

To try running a MapReduce job using YARN, set the HADOOP_MAPRED_HOME environment variable and then submit the job. For example:

$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
$ /usr/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10000