We taught the world the value of big data with open source, and our strong belief in the value of open source, open standards, and open markets are driving the next wave of innovation.
Innovating in open source
Some vendors consume the open source community’s activity; others help drive it. Cloudera leads in influencing Hadoop platform evolution by creating, contributing, and supporting new capabilities that meet your requirements for security, scale, and usability.
Curation of open standards
Cloudera has a long and proven track record of identifying, curating, and supporting open standards (including Apache HBase, Apache Spark, and Apache Kafka) that provide the mainstream, long-term architecture upon which new customer use cases are built.
Highest enterprise requirements
To ensure the best customer experience, Cloudera invests significant resources in multi-dimensional testing on real workloads before releases, as well as in supportability of the entire platform via extensive involvement in the open source community.
Our contributions to the open source community ensure we receive the latest innovations in return
200+
Apache committer seats
65
PMC seats across 22 projects
>35
projects
Our open source ecosystem
Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop services are foundational to data storage, data processing, data access, data governance, security, and operations.
Apache Accumulo
A sorted, distributed key-value store with cell-based access control.
Apache Atlas
Agile enterprise regulatory compliance through metadata.
Apache Flink
A real-time stream processing framework for big data analytics and applications.
Apache Hadoop
A distributed storage and processing framework for large-scale data processing tasks.
Apache HBase
A non-relational (NoSQL) database that runs on top of HDFS.
Apache Hive
The de facto standard for SQL queries in Hadoop.
Apache Iceberg
An open table format for large-scale analytics, delivering the reliability and simplicity of SQL tables.
Apache Impala
The open source, analytic MPP database for Apache Hadoop that provides the fastest time-to-insight.
Apache Kafka
A fast, scalable, fault-tolerant messaging system
Apache Knox Gateway
A secure entry point for Hadoop clusters.
Apache Kudu
Storage for use cases that require fast analytics on rapidly changing data.
Apache NiFi
A real-time integrated data logistics and simple event processing platform.
Apache Oozie
The blueprint for enterprise Hadoop, including its original data storage and data processing layers.
Apache Phoenix
A massively parallel relational database engine supporting OLTP for Hadoop using Apache HBase.
Apache Ranger
Comprehensive security for Enterprise Hadoop.
Apache Solr
Rapid indexing & search on Hadoop.
Apache Spark
Spark adds in-Memory Compute for ETL, Machine Learning and Data Science Workloads to Hadoop.
Apache Sqoop
Efficiently transfers bulk data between Apache Hadoop and structured datastores.
Apache Tez
A Framework for YARN-based, Data Processing Applications In Hadoop.
Apache YARN
The Architectural Center of Enterprise Hadoop.
Apache Zeppelin
A completely open web-based notebook that enables interactive data analytics.
Apache ZooKeeper
An open source server that reliably coordinates distributed processes.
HDFS
A distributed file system designed for storing and managing vast data.
Hue
An open source SQL Workbench for Data Warehouses.