CCB-400 Study Guide

Begin Your Journey to HBase Certification

The best way to study for the certification tests is to take a Cloudera training class. There is a high degree of correlation between Cloudera training classes and Cloudera certification tests. This is partially a function of design and partially the result of research and analysis -- we train and test on those skills, tasks, and knowledge we believe critical to the daily work of an HBase specialist. Of course, every training class is slightly different and is influenced by the needs of the students and the particular response of individual instructors. Also, every training class occurs at a moment in time, and the CDH/Hadoop ecosystem is dynamic. Thus, if you have taken a Cloudera training class, you should study the course materials and notes. However, you should also use these resources to check your understanding and update your skills.

This resource page is an ongoing work-in-progress. Please contribute: if you have exam prep suggestions or ideas, please email them to certification@cloudera.com


Recommended Cloudera Training Course

Cloudera Training for Apache HBase

Practice Test

CCB-400 Practice Test Subscription

Main Topics

CCB-400 is designed to test a candidate’s fluency with the concepts and skills in the following areas:

Core HBase Concepts
Recognize the fundamental characteristics of Apache HBase and its role in a big data ecosystem. Identify differences between Apache HBase and a traditional RDBMS. Describe the relationship between Apache HBase and HDFS. Given a scenario, identify application characteristics that make the scenario an appropriate application for Apache HBase.

Data Model
Describe how an Apache HBase table is physically stored on disk. Identify the differences between a Column Family and a Column Qualifier. Given a data loading scenario, identify how Apache HBase will version the rows. Describe how Apache HBase cells store data. Detail what happens to data when it is deleted.

Architecture
Identify the major components of an Apache HBase cluster. Recognize how regions work and their benefits under various scenarios. Describe how a client finds a row in an HBase table. Understand the function and purpose of minor and major compactions. Given a region server crash scenario, describe how Apache HBase fails over to another region server. Describe RegionServer splits.

Schema Design
Describe the factors to be considered with creating Column Families. Given an access pattern, define the row keys for optimal read performance. Given an access pattern, define the row keys for locality.

API
Describe the functions and purpose of the HBaseAdmin class. Given a table and rowkey, use the get() operation to return specific versions of that row. Describe the behavior of the checkAndPut() method.

Administration
Recognize how to create, describe, and access data in tables from the shell. Describe how to bulk load data into Apache HBase. Recognize the benefits of managed region splits.


Sample Questions

Question 1

You want to store clickstream data in HBase. Your data consists of the following: the source id, the name of the cluster, the URL of the click, the timestamp for each click

Which rowkey would you use if you wanted to retrieve the source ids with a scan and sorted with the most recent first?

A. <(Long)timestamp>
B. <source_id><Long.MAX_VALUE – (Long)timestamp>
C. <timestamp><Long.MAX_VALUE>
D. <Long.MAX_VALUE><timestamp>

Question 2

Your application needs to retrieve 200 to 300 non-sequential rows from a table with one billion rows. You know the rowkey of each of the rows you need to retrieve. Which does your application need to implement?

A. Scan without range
B. Scan with start and stop row
C. HTable.get(Get get)
D. HTable.get(List<Get> gets)

Question 3

You perform a check and put operation from within an HBase application using the following:

table.checkAndPut(Bytes.toBytes("rowkey"),
Bytes.toBytes("colfam"),
Bytes.toBytes("qualifier"),
Bytes.toBytes("barvalue"), newrow));

Which describes this check and put operation?

A. Check if rowkey/colfam/qualifier exists and the cell value "barvalue" is equal to newrow. Then return “true”.
B. Check if rowkey/colfam/qualifier and the cell value "barvalue" is NOT equal to newrow. Then return “true”.
C. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, put the values in newrow and return “false”.
D. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, put the values in newrow and return “true”.

Question 4

What is the advantage of the using the bulk load API over doing individual Puts for bulk insert operations?

A.Writes bypass the HLog/MemStore reducing load on the RegionServer.
B.Users doing bulk Writes may disable writing to the WAL which results in possible data loss.
C.HFiles created by the bulk load API are guaranteed to be co-located with the RegionServer hosting the region.
D.HFiles written out via the bulk load API are more space efficient than those written out of RegionServers.

Question 5

You have a “WebLog” table in HBase. The Row Keys are the IP Addresses. You want to retrieve all entries that have an IP Address of 75.67.12.146. The shell command you would use is:

A. get 'WebLog', '75.67.21.146'
B. scan 'WebLog', '75.67.21.146'
C. get 'WebLog', {FILTER => '75.67.21.146'}
D. scan 'WebLog', {COLFAM => 'IP', FILTER => '75.67.12.146'}

Answers

Question 1: B
Question 2: D
Question 3: D
Question 4: A
Question 5: A


Disclaimer: These exam preparation pages are intended to provide information about the objectives covered by each exam, related resources, and recommended reading and courses. The material contained within these pages is not intended to guarantee a passing score on any exam. Cloudera recommends that a candidate thoroughly understand the objectives for each exam and utilize the resources and training courses recommended on these pages to gain a thorough understand of the domain of knowledge related to the role the exam evaluates.