Sentry enables role-based, fine-grained authorization for HiveServer2 and Cloudera Impala. It provides classic database-style authorization for Hive and Impala. Follow the instructions below to install and configure Sentry manually under the current CDH release.
- For instructions for using the latest version of Cloudera Manager to install and configure Hive Authorization with Sentry under CDH4.4.0 or later, and for enabling Sentry for Impala, see Setting Up Hive Authorization with Sentry. See also Impala Security.
- For instructions for using Sentry with Search, see Configuring Sentry for Search.
- If you want to install the standalone version of Sentry that was provided with CDH4.3.0, see If you want to install the version of Sentry provided with CDH4.3.0.
- If you are using Cloudera Manager 4.5 or 4.6, see If you are using Cloudera Manager 4.5 or 4.6.
Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:
- CDH4.3.0 or later.
- HiveServer2 with strong authentication (Kerberos or LDAP).
- A secure Hadoop cluster.
This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.
In addition, make sure that the following are true:
- The Hive warehouse directory (/user/hive/warehouse or any
path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by
the Hive user and group.
- Permissions on the warehouse directory must be set as follows:
- 770 on the directory itself (for example, /user/hive/warehouse)
- 770 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
- All files and subdirectories should be owned by hive:hive
$ sudo -u hdfs hdfs dfs -chmod -R 770 /user/hive/warehouse $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouseNote
:If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.Important :These instructions override the recommendations in the Hive section of the CDH4 Installation Guide.
- Permissions on the warehouse directory must be set as follows:
- HiveServer2 impersonation must be turned off.
- The Hive user must be able to submit MapReduce jobs. You can ensure that this is true by
setting the minimum user ID for job submission to 0. Set this value in Cloudera Manager
under MapReduce Properties, or (if you are not using Cloudera Manager) edit the
taskcontroller.cfg file and set min.user.id=0.
- You must restart the cluster and HiveServer2 after changing this value, whether you use Cloudera Manager or edit taskcontroller.cfg.
- These instructions override the instructions under "Configuring MRv1 Security" in the CDH4 Security Guide.
Roles and Privileges
Sentry uses a role-based privilege model. A role is a collection of rules for accessing a given Hive object. The objects supported in the current release are server, database, table, and URI. Access to each object is governed by privileges: Select, Insert, or All.
sales_reporting = \server=server1->db=sales->table=customer->action=Select, \server=server1->db=sales->table=items>action=Select, \server=server1->db=reports->table=sales_insights>action=Insert
The Sentry privilege model has the following characteristics:
- Allows any user to execute show function, desc function, and show locks
- Allows the user to see only those tables and databases for which this user has privileges
- Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a location. Examples of such operations include LOAD, IMPORT, and EXPORT.
For more information, see Appendix: Authorization Privilege Model for Hive.
Users and Groups
- A user is an entity that is permitted by the authentication subsystem to access the Hive service. This entity can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system supported by HiveServer2.
- A group connects the authentication system with the authorization system. It is a collection of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
- A configured group provider determines a user’s
affiliation with a group. The current release supports HDFS-backed groups and locally
configured groups. For example,
analyst = sales_reporting, data_export, audit_report
Here the group analyst is granted the roles sales_reporting, data_export, and audit_report. The members of this group can run the HiveQL statements that are allowed by these roles. If this is an HDFS-backed group, then all the users belonging to the HDFS group analyst can run such queries.
User to Group Mapping
You can configure Sentry to use either Hadoop groups or groups defined in the policy file.
To configure Hadoop groups:
Set the hive.sentry.provider property in sentry-site.xml to org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.
To configure local groups:
- Define local groups in a [users] section of the Policy file. For
[users] user1 = group1, group2, group3 user2 = group2, group3
- In sentry-site.xml, set hive.sentry.provider as
<property> <name>hive.sentry.provider</name> <value>org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider</value> </property>
Setup and Configuration
This release of Sentry stores the configuration as well as privilege policies in files. The sentry-site.xml file contains configuration options such as group association provider, privilege policy file location, and so on. The Policy file contains the privileges and groups. It has a .ini file format and can be stored on a local file system or HDFS.
Sentry is plugged into Hive as session hooks which you configure in hive-site.xml. The sentry package must be installed; it contains the required JAR files. You must also configure properties in the Sentry Configuration File.
If you have not already done so, install Cloudera's yum, zypper/YaST or apt repository before using the following commands. For instructions, see CDH4 Installation.
- Install Sentry as follows, depending on your operating system:
- On Red Hat and similar systems:
$ sudo yum install sentry
- On SLES systems:
$ sudo zypper install sentry
- On Ubuntu and Debian systems:
sudo apt-get update; sudo apt-get install sentry
- On Red Hat and similar systems:
The sections that follow contain notes on creating and maintaining the policy file.
Storing the Policy File
Considerations for storing the policy file(s) in HDFS include:
- Replication count - Because the file is read for each query in Hive and read once every five minutes by all Impala daemons, you should increase this value; since it is a small file, setting the replication count equal to the number of slave nodes in the cluster is reasonable.
- Updating the file - Updates to the file are reflected immediately, so you should write them to a temporary copy of the file first, and then replace the existing file with the temporary one after all the updates are complete. This avoids race conditions caused by reads on an incomplete file.
role1 = privilege1 role1 = privilege2Role names are scoped to a specific file. For example, if you give role1 the ALL privilege on db1 in the global policy file and give role1 ALL on db2 in the per-db db2 policy file, the user will be given both privileges.
URIs must start with either hdfs:// or file://. If a URI starts with anything else, it will cause an exception and the policy file will be invalid.
data_read = server=server1->uri=file:///path/to/dir,\ server=server1->uri=hdfs://namenode:port/path/to/dir
This section provides a sample configuration.
The following is an example of a policy file with a per-DB policy file. In this example, the first policy file, sentry-provider.ini would exist in HDFS; hdfs://ha-nn-uri/etc/sentry/sentry-provider.ini might be an appropriate location. The per-DB policy file is for the customer's database. It is located at hdfs://ha-nn-uri/etc/sentry/customers.ini.
[databases] # Defines the location of the per DB policy file for the customers DB/schema customers = hdfs://ha-nn-uri/etc/sentry/customers.ini [groups] # Assigns each Hadoop group to its set of roles manager = analyst_role, junior_analyst_role analyst = analyst_role jranalyst = junior_analyst_role customers_admin = customers_admin_role admin = admin_role [roles] # The uris below define a define a landing skid which # the user can use to import or export data from the system. # Since the server runs as the user "hive" files in that directory # must either have the group hive and read/write set or # be world read/write. analyst_role = server=server1->db=analyst1, \ server=server1->db=jranalyst1->table=*->action=select server=server1->uri=hdfs://ha-nn-uri/landing/analyst1 junior_analyst_role = server=server1->db=jranalyst1, \ server=server1->uri=hdfs://ha-nn-uri/landing/jranalyst1 # Implies everything on server1 -> customers. Privileges for # customers can be defined in the global policy file even though # customers has its only policy file. Note that the Privileges from # both the global policy file and the per-DB policy file # are merged. There is no overriding. customers_admin_role = server=server1->db=customers # Implies everything on server1. admin_role = server=server1
[groups] manager = customers_insert_role, customers_select_role analyst = customers_select_role [roles] customers_insert_role = server=server1->db=customers->table=*->action=insert customers_select_role = server=server1->db=customers->table=*->action=select
Sentry Configuration File
The following is an example of a sentry-site.xml file.
<configuration> <property> <name>hive.sentry.provider</name> <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value> </property> <property> <name>hive.sentry.provider.resource</name> <value>/path/to/authz-provider.ini</value> <!-- If the hdfs-site.xml points to HDFS, the path will be in HDFS; alternatively you could specify a full path, e.g.: hdfs://namenode:port/path/to/authz-provider.ini file:///path/to/authz-provider.ini --> </property> <property> <name>hive.sentry.server</name> <value>server1</value> </property> </configuration>
Enabling Sentry in HiveServer2
<property> <name>hive.server2.session.hook</name> <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value> </property> <property> <name>hive.sentry.conf.url</name> <value></value> <description>sentry-site.xml file location</description> </property>
Securing the Hive Metastore
It's important that the Hive metastore be secured. Do this by turning on Hive metastore security, using the instructions in the CDH4 Security Guide:
Debugging Failed Sentry Authorization Requests
- In Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the logging settings for your service through the corresponding Logging Safety Valve field for the Impala, Hive Server 2, or Solr Server services.
- On systems not managed by Cloudera Manager, add log4j.logger.org.apache.sentry=DEBUG to the log4j.properties file on each host in the cluster, in the appropriate configuration directory for each service.
FilePermission server..., RequestPermission server...., result [true|false]which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating
Appendix: Authorization Privilege Model for Hive
Privileges can be granted on different objects in the Hive warehouse. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a container object in the hierarchy, the base object automatically inherits it. For instance, if a user has ALL privileges on the database scope, then (s)he has ALL privileges on all of the base objects contained within that scope.
Object hierarchy in Hive
Server Database Table Partition Columns View Index Function/Routine Lock
|SELECT||TABLE, VIEW, URI|
|ALL||SERVER, DB, URI|
|Base Object||Granular privileges on object||Container object that contains the base object||Privileges on container object that implies privileges on the base object|
|Hive SQL Operation||Privileges||Scope|
|SWITCHDATABASE||Any||Any Table, View in the DB|
|DESCTABLE||SELECT or INSERT||Table|
|ALTER TABLE ADD COLS||ALL||Database|
|ALTER TABLE REPLACE COLS||ALL||Database|
|ALTER TABLE RENAME COL||ALL||Database|
|ALTER TABLE RENAME PART||ALL||Database|
|ALTER TABLE RENAME||ALL||Database|
|ALTER TABLE DROP PART||ALL||Database|
|ALTER TABLE ADD PART||ALL, ALL@URI||Database|
|ALTER TABLE ARCHIVE||ALL||Database|
|ALTER TABLE UNARCHIVE||ALL||Database|
|ALTER TABLE PROPERTIES||ALL||Database|
|ALTER TABLE SERIALIZER||ALL||Database|
|ALTER PARTITION SERIALIZER||ALL||Database|
|ALTER TABLE SERDEPROPS||ALL||Database|
|ALTER PARTITION SERDEPROPS||ALL||Database|
|ALTER TABLE CLUSTER SORT||ALL||Database|
|SHOW DATABASE||Any Privilege||Any obj in the DB|
|SHOW TABLES||SELECT or INSERT||Table|
|SHOW COLUMNS||SELECT or INSERT||Table|
|SHOW TABLE STATUS||SELECT or INSERT||Table|
|SHOW TABLE PROPERTIES||SELECT or INSERT||Table|
|SHOW CREATE TABLE||SELECT or INSERT||Table|
|SHOW FUNCTIONS*||Not restricted|
|SHOW PARTITIONS||SELECT or INSERT||Table|
|SHOW INDEXES||SELECT or INSERT||Table|
|SHOW LOCKS||Not Restricted|
|DROP FUNCTION||Any Privilege||Any Object|
|ALTER INDEX REBUILD||ALL||Database|
|ALTER VIEW PROPERTIES||ALL||Database|
|GRANT PRIVILEGE||Allowed, but has no effect on HS2 auth|
|REVOKE PRIVILEGE||Allowed, but has no effect on HS2 auth|
|SHOW GRANTS||Allowed, but has no effect on HS2 auth|
|ALTER TABLE PROTECT MODE||ALL||Database|
|ALTER TABLE FILE FORMAT||ALL||Database|
|ALTER TABLE LOCATION*||ALL||Server|
|ALTER PARTITION PROTECT MODE||ALL||Database|
|ALTER PARTITION FILE FORMAT||ALL||Database|
|ALTER PARTITION LOCATION||ALL||Server|
|CREATE TABLE EXTERNAL||ALL, SELECT@URI||Database|
|CREATE TABLE AS SELECT||ALL, SELECT||Database, Table/View|
|ALTER INDEX PROP||ALL||Database|
|ALTER TABLE MERGE FILE||ALL||Database|
|ALTER PARTITION MERGE FILE||ALL||Database|
|ALTER TABLE SKEWED||ALL||Database|
|ALTER TABLE PARTITION SKEWED LOCATION||ALL||Database|
|ADD JAR||Restricted unless hive.server2.authorization.external.exec = true|