Cloudera Data Platform (CDP) leverages the best tools for data security and governance - Apache Atlas and Apache Ranger. Administrators can easily define security policies based on Atlas metadata tags and apply a security policy in real-time to the entire hierarchy of entities, including databases, tables, and columns.
You will learn how to classify your data, who can access the data and how to mask the data.
There are two ways to watch the tutorial-video:
Our environment consists of
Let's begin:
Select Data Warehouse from Cloudera Data Platform (CDP) home page
Open DAS by first locating your virtual warehouse, then:
From Data Analytics Studio (DAS):
CREATE DATABASE IF NOT EXISTS dbgr; CREATE TABLE IF NOT EXISTS dbgr.employee_data ( id INT, first_name STRING, last_name STRING, email STRING, title STRING, salary DECIMAL(10,2) ); INSERT INTO dbgr.employee_data SELECT INLINE(array( struct(1 , "Patty" , "Harvison" , "PattyHarvison@somewhere.com" , "Accountant I" , 48532.04) ,struct(2 , "Abbey" , "Ledingham" , "AbbeyLedingham@somewhere.com" , "Marketing Assistant" , 58700.35) ,struct(3 , "Tricia" , "Budgey" , "TriciaBudgey@somewhere.com" , "Nuclear Power Engineer" , 48081.25) ,struct(4 , "Saraann" , "Corwin" , "SaraannCorwin@somewhere.com" , "Professor" , 49246.32) ,struct(5 , "Reese" , "Bownes" , "ReeseBownes@somewhere.com" , "Marketing Manager" , 70615.84) ,struct(6 , "Jennee" , "Hawson" , "JenneeHawson@somewhere.com" , "Clinical Specialist" , 61017.10) ,struct(7 , "Malinde" , "Kabsch" , "MalindeKabsch@somewhere.com" , "Developer I" , 48767.52) ,struct(8 , "Darline" , "Wagstaffe" , "DarlineWagstaffe@somewhere.com" , "Quality Engineer" , 61330.88) ,struct(9 , "Rhona" , "Damarell" , "RhonaDamarell@somewhere.com" , "Legal Assistant" , 42030.92) ,struct(10 , "Dagmar" , "Sandom" , "DagmarSandom@somewhere.com" , "Staff Scientist" , 74302.82) ,struct(11 , "Debora" , "Bielfelt" , "DeboraBielfelt@somewhere.com" , "Assistant Media Planner" , 59329.91) ,struct(12 , "Yule" , "Morigan" , "YuleMorigan@somewhere.com" , "Systems Administrator II" , 72053.94) ,struct(13 , "Clarette" , "Naptine" , "ClaretteNaptine@somewhere.com" , "GIS Technical Architect" , 74593.99) ,struct(14 , "Leonard" , "Petrik" , "LeonardPetrik@somewhere.com" , "Financial Analyst" , 49876.08) ,struct(15 , "Colver" , "Scudamore" , "ColverScudamore@somewhere.com" , "Media Manager IV" , 55048.58) ));
Have each user (gdeleon, joe_analyst and ivanna_eu_hr) run the query below. It should be successful for everyone.
SELECT * FROM dbgr.employee_data;
Open Atlas for your tenant:
Beginning from CDP home page > Data Warehouse:
Let's create a new classification:
Create a new classification, sensitive, with the following attributes:
sensitive
holds sensitive data
Search for the table we want to assign this new classification.
Use the following search criteria:
hive_table
employee_data
Click on Search
Let's assign our new classification, sensitive, to column salary:
Open Ranger for your tenant:
Beginning from CDP home page > Data Warehouse:
Let's create a tag-based policy, also known as, Access-Based Attribute Control (ABAC).
Note: Your service name may be different from ours.
Access policies allow us to place restrictions on data columns that are specially marked. In this example, we will restrict our sensitive classified columns only to users in group cdp_sandbox-default and joe_analyst. No one else should be able to access or read data marked as sensitive.
Select Access tab, then Add New Policy.
Add a new policy using:
sensitive_access
sensitive
access to sensitive classified columns
Have each user (gdeleon, joe_analyst and ivanna_eu_hr) re-run the query below.
SELECT * FROM dbgr.employee_data;
User gdeleon belongs to group cdp_sandbox-default, therefore it successfully ran.
User joe_analyst was explicitly given select access, therefore it successfully ran.
It failed for ivanna_eu_hr - Permission denied: user [ivanna_eu_hr] does not have [SELECT] privilege. This user does not belong to group cdp_sandbox-default nor was given select access.
Using the select statement below, let's modify the query by removing the sensitive column (salary); statement now runs successfully.
select id,first_name,last_name,email,title from dbgr.employee_data;
Knowledge growth questions/problems:
We are going place viewing restrictions on our sensitive classified columns. Although a user may have access to the sensitive data, we may want mask the real data.
Only users in group cdp_sandbox-default should see real data. All others should see masked data.
Select Masking tab, then Add New Policy.
Add a new policy using:
sensitive_masking
sensitive
mask sensitive data
Have user (gdeleon) re-run the query below. It runs successfully - showing all data; no masking.
SELECT * FROM dbgr.employee_data;
Have user (joe_analyst) re-run the query below. It runs successfully. However, salary data is masked with nulls.
SELECT * FROM dbgr.employee_data;
Knowledge growth questions/problems:
Visit Cloudera's Collections-SDX library of videos. They provide a great overview of Cloudera's Shared Data Experience (SDX). Here are two that related to this tutorial:
Cloudera OnDemand provides world-class training - anywhere, anytime.
This may have been caused by one of the following: