This tutorial walks you through the process of creating, resizing, and terminating Data Hubs on the Cloudera Data Platform (CDP) Public Cloud.
In order to save the end user time and cost, Data Hubs can be provisioned, resized, or disposed of quickly in response to rapidly changing workloads.
The videos below provide a brief overview of what this tutorial covers:
The following concepts are key to understanding Data Hub:
Workload clusters
All Data Hub clusters are workload clusters. These clusters are created for running specific workloads such as data engineering or data analytics.
Cluster definitions
A cluster definition is a reusable cluster template in JSON format that can be used for creating multiple Data Hub clusters with identical cloud provider settings.
Cluster templates
Data Hub uses cluster templates for defining cluster topology: defining host groups and components installed on each host group.
Recipes
A recipe is a script that runs on all nodes of a selected host group at a specific time. You can use recipes for tasks such as installing additional software or performing advanced cluster configuration. For example, you can use a recipe to put a JAR file on the Hadoop classpath.
Custom properties
Custom properties are configuration properties that can be set on a Cloudera Runtime cluster, but Data Hub allows you to conveniently set these during cluster creation.
In the Environments section, search for the environment you want to create a Data Hub and click on its name:
You need to choose from two(2) options for provisioning this Data Hub:
Choose a previously created, custom cluster definition
In this tutorial, we will choose Cluster Definition, which provides a large selection of predefined cluster definitions:
Note: Your CDP environment may have different cluster definitions.
Let's complete the data hub provisioning form:
usermarketing
7.2.0 - Flow Management Light Duty for AWS
um-nifi-demo
At this point, you may do step 5. Provision Cluster to complete the provisioning.
We suggest to review Advanced Options prior to provisioning.
Recipes are scripts that run on all nodes of a selected host group at a specific time. Available recipe execution times are:
All registered recipes are located in:
Environments > Shared Resources > Recipes.
Custom Properties: Allows you to configure properties that can be set on a Cloudera Runtime cluster, but Data Hub allows you to conveniently set it during cluster creation.
The list of services available depend on the Cluster Definition chosen. As you recall, we chose Cluster Definition: 7.2.0 - Flow Management Light Duty for AWS, which provides three(3) services:
There are nine (9) tabs to explore:
Shows events logged for the cluster, with the most recent event at the top. The Download option allows you to download the event history.
Displays information about your cluster instances: instance names, instance IDs, instance types, their status, fully qualified domain names (FQDNs), and private and public IPs.
Displays cloud storage locations for certain properties.
Displays key and value pair(s) of the user-defined tags.
Displays the URL for all cluster API endpoints.
Displays recipe-related information. For each recipe, you can see the host group on which a recipe was executed, recipe name, and recipe type.
Displays Cloudera Manager and Cloudera Runtime repository information, as you provided when creating a cluster.
Displays information about the image catalog used and its location.
Displays information about the names of the network and subnet in which the cluster is running and the links to related cloud provider console.
When you decide that this Data Hub is no longer needed, you may either STOP or DELETE the Data Hub cluster:
When you are ready to restart, click on START
Once initiated, there’s no way to undo
This may have been caused by one of the following: