Fast moving data and real time analysis present us with some amazing opportunities. Don’t blink—or you’ll miss it! Every organization has some data that happens in real time, whether it is understanding what our users are doing on our websites or watching our systems and equipment as they perform mission critical tasks for us. This real-time data, when captured and analyzed in a timely manner, may deliver tremendous business value. For example:
By capitalizing on the business value of fast-moving and real-time analytics, we can do some game changing things. We can reduce costs, eliminate unnecessary work, improve customer satisfaction and experience, and reduce churn. We can get to faster root-cause analysis and become proactive instead of reactive to changes in markets, business operations, and customer behavior. We can get the jump on competition, reduce surprises that cause disruption, have better organizational operational health, and reduce unnecessary waste and cost everywhere.
However, there are some key capabilities that will make real-time analytics a practical and applied reality. What we need is:
On top of these core critical capabilities, we also need the following:
And all of this should ideally be delivered in an easy to deploy and administer data platform available to work in any cloud.
Cloudera Data Platform (CDP) offers Apache Kudu as part of our Data Hub cloud service, providing a consistent, dependable way to support the ingestion of data streams into our analytics environment, in real time, and at any scale. CDP also offers the Cloudera Data Warehouse (CDW) as a containerized service with the flexibility to scale up and down as needed, and multiple CDW instances can be configured against the same data to provide different configurations and scaling options to optimize for workload performance and cost. This also achieves workload isolation, so we can run mission critical workloads independent from experimental and exploratory ones and nobody steps on anyone’s toes by accident.
Support for Apache NiFi, Spark Streaming, and Flink pre-integrated and out of the box. Kudu also has native support for C++, Java, and Python APIs for capturing data streams from applications and components based on those languages. With such a wide range of ingest types, Kudu can get anything you need from any real-time data source.
CDW integrates Kudu in Data Hub services with containerized Impala to offer easy to deploy and administer, flexible real-time analytics. With this unique architecture, we support stable and consistent ingestion of huge volumes of fast moving data, tougher with flexible, workload-isolated data warehousing services. We get optimized price/performance on complex workloads over massive scale data.
Let’s take a close look at how to get started with CDP, Kudu, CDW, and Impala and develop a game changing real-time analytics platform.
Check out our recent blog on integrating Apache Kudu on Cloudera Data Hub and Apache Impala on Cloudera Data Warehouse to learn how to implement this in your Cloudera Data Platform environment.
This may have been caused by one of the following: