Cloudera Announces New Distribution for Hadoop to Bring Data Processing Power to Enterprises


First Distribution for Hadoop Adds Easy Installation, Simple Configuration and Commercial Support to the Open Source Technology Powering the World’s Largest Web Companies

BURLINGAME, CA–(Marketwire – March 16, 2009) – Cloudera, the commercial Hadoop™ company, today announced the general availability of the Cloudera Distribution for Hadoop, an open source product used to store and process complex, large-scale data: petabytes of information, often distributed across thousands of servers. Hadoop is in production use at most of the world’s largest Web companies, including Facebook, Google, and Yahoo!. Cloudera, with the financial backing of Accel Partners, is the first company to develop technology to bring Hadoop into enterprise data centers.

“After working with large Hadoop deployments at companies like Facebook, Google and Yahoo!, we came to realize that people needed Hadoop installation, configuration, and management to be much easier,” said Christophe Bisciglia, Cloudera founder and former manager of Google’s Hadoop cluster. “Cloudera is advancing Hadoop technology to make it easier for everyone to store and process the same types of complex, large-scale data that large Web companies are successfully using in their businesses.”

The Cloudera Distribution for Hadoop is freely available for download and immediate use. The product is distributed as a pre-packaged RPM bundle for Red Hat Linux systems or an Amazon EC2 image. To make Hadoop easy to install and use, Cloudera is launching a new portal called where people can use a Web-based configuration tool to create custom packages that are optimized to their specific needs. Settings for the cluster can also be saved on the portal to enable automatic updates. There is no charge to use The RPM packages and EC2 images are freely distributed under the Apache 2 software license.

“Since we use Hadoop to help run our business, we are excited that Cloudera is offering commercial support for Hadoop and is making the technology more accessible to businesses,” said David C Peterson, SVP Technology at ContextWeb, Inc., a leading contextual advertising company and operator of the ADSDAQ Exchange. “Businesses need to feel confident that there is a company like Cloudera to stand behind Hadoop in order for this great open source technology to become widely used by companies.”

Cloudera is also making a pre-configured VMware image freely available for evaluation and use with their free online training. People that want to test the Cloudera Distribution for Hadoop or learn more about Hadoop and Cloudera’s online training can download the image and run it on their Linux, Mac or Windows desktop. The image ships with example code and all the components needed to use the Cloudera Distribution for Hadoop, including a master server and single node.

The Cloudera Distribution for Hadoop is a complete system to handle the processing and storage of big data. Major components include:

– HDFS – Hadoop Distributed File System, a distributed and fault-
tolerant file system designed to run on commodity hardware. HDFS assumes
that hardware failure is normal and provides quick detection and automatic
recovery. HDFS can support tens of millions of files in a single instance;

– MapReduce implementation to divide applications into many small blocks
of work for automatic parallelization and execution on large clusters.
Cloudera’s implementation of MapReduce takes care of partitioning of input
data, scheduling program execution across distributed machines, and the
handling of machine failure;

– Hive – a data warehousing infrastructure built on top of Hadoop that
provides tools for easy data summary generation, ad hoc querying, and
analysis. Hive comes with Hive QL, a simple query language based on SQL.

– Pig – a platform for analyzing large data sets in Hadoop using a high-
level language for expressing data analysis programs, PigLatin.

Additional information about:

Cloudera Distribution for Hadoop with free access to a web configuration system, downloadable software, VMware image and documentation;

– The Story of the Cloudera Distribution for Hadoop – video featuring
CEO and founder (;

– Screencast on configuring the Cloudera Distribution for Hadoop:

About Cloudera

Cloudera, the leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. Cloudera's Distribution including Apache Hadoop (CDH), available to download for free at, is the most comprehensive, tested, stable and widely deployed distribution of Hadoop in commercial and non-commercial environments. For the fastest path to reliably using this completely open source technology in production for Big Data analytics and answering previously un-addressable big questions, organizations can subscribe to Cloudera Enterprise, comprised of Cloudera Manager software and Cloudera Support. Cloudera also offers training and certification on Apache technologies, as well as consulting services. As the top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas and gaming, Cloudera's depth of experience and commitment to sharing expertise are unrivaled.

Connect with Cloudera

Read the blog:
Follow on Twitter:
Visit on Facebook: