Overview
What is Stream Processing?
Cloudera Stream Processing enables customers to turn streams into data products by providing capabilities to analyze streaming data for complex patterns and gain actionable intel.
Stream Processing is powered by Apache Flink and Kafka and provides a complete, enterprise-grade stream management and stateful processing solution. The combination of Kafka as the storage streaming substrate, Flink as the core in-stream processing engine, and first-class support for industry standard interfaces like SQL and REST allows developers, data analysts, and data scientist to easily build hybrid streaming data pipelines that power real-time data products, dashboards, business intelligence apps, microservices, and data science notebooks.
Use cases like fraud detection, network threat analysis, manufacturing intelligence, commerce optimization, real-time offers, instantaneous loan approvals, and more are now possible by moving the data processing components up the stream to address these real-time needs.
HYBRID STREAMING DATA PIPELINES POWERED
BY STREAM PROCESSING
Stream Processing use cases
Fraud Detection
Customer Analytics
Market Monitoring
Log Analytics
Fraud detection
Prevent millions of dollars in loss due to financial fraud by detecting it proactively.
Enterprises across retail, financial services, and other sectors struggle to protect customer data and prevent financial fraud from happening. Cloudera Streaming Processing's capabilities can process real-time streams of customer transactions, identify patterns, create predictive alerts, and uncover actionable intelligence to prevent potential fraud.
Customer analytics
Real-time customer analytics improves engagement, retention, and satisfaction.
Every organization needs real-time analytics to improve customer engagement but struggles to implement it due to an excessive volume of data. Cloudera Stream Processing enables customer analytics by processing massive amounts of data with subsecond latencies while detecting customer interactions and recommending better offerings in real time.
Market monitoring
Handle millions of trades a second and scale to petabytes of financial information.
Financial stock exchanges face challenges with customer demands for real-time reporting and faster SLA requirements. Yet, petabytes of data must be processed to deliver these services. Cloudera Streams Messaging can easily stream high volumes of data so stock exchanges can quickly create market-driven real-time analytics and meet the increasingly demanding SLAs.
Log analytics
Modernize your logging infrastructure to get real-time analytics.
Log data is increasingly valuable to enterprises. But IT organizations are struggling with effective log collection processes, distributing relevant information upstream, and generating key metrics. Cloudera Stream Processing's capabilities help scale up log processing, deliver real-time insights across the firm, and significantly reduce operating costs.
Stream Processing capabilities
-
Streaming Analytics powered by Apache Flink
-
Streams Messaging powered by Apache Kafka
Streaming Analytics
Powered by Apache Flink with SQL Stream Builder, Cloudera Streaming Analytics provides:
- Low-latency stream processing capabilities
- Simplifies development by enabling users to write streaming applications with industry standard SQL and APIS via REST endpoints
- Advanced windowing techniques to build sophisticated event-driven analytics
- Support for multi-cloud and hybrid cloud models
Cloudera SQL Stream Builder is a comprehensive interactive UI for creating stateful stream processing jobs using SQL which gets converted into optimized Flink jobs. By using SQL, you can simply and easily declare expressions that filter, aggregate, route, and otherwise mutate streams of data. SQL Stream Builder is a job management interface that you can use to compose and run SQL on streams as well as to create durable data APIs for the results.
Ensure that data is processed exactly once at all times even during errors and retries. For example, a financial services company needs to use stream processing to coordinate hundreds of back-office transactions systems when consumers pay their home mortgage.
Detect and deal with streaming events that come out of order. For example, real-time fraudulent services need to ensure data is processed in the right order even if data arrives late.
Achieve in-memory, one-at-a time stream processing performance. For example, process requests of 30 million active users making credit card payments, transfers, and balance lookups with millisecond latency.
Trigger events when dealing with hundreds of streaming sources and millions of events per second per stream. For example, when a patient checks into the ER, the system reaches out to external systems to pull patient-specific data from hundreds of sources so it’s available in an EMR by the time the patient arrives in the exam room.
Streaming data has little value unless it can easily integrate, join, and mesh those streams with other at-rest data sources including warehouses, relational databases, and data lakes. Configure data providers using out-of-the-box connectors or your own connector to any data source. Once the data providers are created, the user can easily create virtual tables using DDL. Complex integration between multiple streams and batch data sources becomes easier with well-known SQL constructs such as joins and aggregations.
Streams Messaging
Powered by Apache Kafka, Cloudera Streams Messaging provides:
- Streams Messaging Manager to monitor/operate clusters
- Streams Replication Manager for HA/DR deployments
- Schema Registry for centralized schema management
- Kafka Connect for simple data movement and change data capture and Cruise Control for intelligent rebalancing and self healing
- Support for multi-cloud and hybrid cloud models
Supports millions of messages per second with low latency and high throughput, scaling elastically and transparently without downtime. Addresses a wide range of streaming data initiatives, enabling enterprises to keep up with customer demand, provide better services, and proactively manage risk.
Streams Messaging Manager provides a single pane of glass view with end-to-end visibility into how data moves across Kafka clusters—among producers, brokers, topics, and consumers—allowing you to track data lineage and governance from edge to cloud. It also simplifies troubleshooting of Kafka environments with intelligent filtering and sorting.
Streams Replication Manager, based on Mirrormaker 2, offers fault-tolerant, scalable, and robust cross-cluster Kafka topic replication, as well as replication monitoring and metrics at the cluster and topic levels. Delivers high availability, disaster recovery, cloud migrations, geo-proximity, and many others.
Schema Registry lets you manage, share, and support the evolution of all producer and customer schemas in a shared schema repository that allows applications to flexibly interact with each other across the Kafka landscape. Safely mitigate interruptions that occur due to schema mismatches.
Cruise Control lets you manage and load-balance large Kafka installations, as well as automatically detect and remediate anomalies. Address hard problems such as frequent hardware/virtual machine failures, cluster expansion/reduction, and load skew among brokers.
Cloudera SDX offers centralized security, control policies, governance, and data lineage across all components. They are set once and automatically enforced and are vendor-agnostic, allowing you to confidently embrace multi-cloud and hybrid cloud strategies. Supports the four main pillars of security: Identity, access, data protection, and visibility.
Any data, anywhere, with flexible deployment options.
Stream Processing in the cloud
Cloudera features a complete set of integrated stream processing capabilities that can be deployed in the public cloud to scale efficiently.
Cloudera Stream Processing is built on Apache Kafka and Apache Flink engines with enterprise-grade tooling to simplify deployment and management.
Streams Messaging Manager extends Apache Kafka with a set of capabilities to address schema governance and monitoring, disaster recovery, intelligent rebalancing, and robust access control and audit.
SQL Stream Builder extends Apache Flink with a powerful SQL Console that lets SQL analysts query streaming data as well as collaborate and version control processing logic for downstream applications.
Stream Processing on premises
Cloudera can be deployed on premises with streaming data to control costs and minimize latency for real-time pipelines and applications. Cloudera Stream Processing integrates Apache Kafka and Apache Flink with enterprise tooling needed to manage these deployments.
Cloudera Streaming - Kubernetes Operators
Cloudera Stream Processing capabilities are also available as Kubernetes Operators that can be deployed independently via existing Kubernetes clusters, making it even easier to deploy and scale Kafka to the enterprise. The Kubernetes operator ships with Kafka, Cruise Control, and Zookeeper, enabling streaming use cases on Kubernetes with a robust message broker service, and Flink and SQL Stream Builder, providing a modern distributed stream processing engine to build real-time streaming application that run natively on containers.
Cloudera Stream Processing Community Edition
Stream Processing Community Edition makes developing stream processors easy and can be done right from your desktop or any other development node.
Analysts, data scientists, and developers can now evaluate new features, develop SQL-based stream processors locally, and develop Kafka Consumers/Producers and Kafka Connect Connectors, all locally before moving to production.
Get up and running in 5 minutes with the Stream Processing Community Edition.
GigaOm Radar for Streaming Data Platforms
Cloudera named a 2024 market leader for streaming data platforms.
Ready to get started?