CDH 4.5.0

Cloudera’s 100% Open Source Hadoop Platform

CDH is Cloudera's open source software distribution and consists of Apache Hadoop and additional key open source projects to ensure you get the most out of Hadoop and your data.

It is the only Hadoop solution to offer unified querying options (including batch processing, interactive SQL, text search, and machine learning) and necessary enterprise security features (such as role-based access controls).

Please note: CDH requires manual installation from the command line.
For a faster, automated installation download Cloudera Manager.

CDH Packaging and Tarball Information

Each CDH release series is made up of a collection of CDH project packages that are known to work together. The package version numbers of the CDH projects in each CDH release are listed in the following table.


CDH Version 4.5.0 Packaging and Tarballs

To view the overall release notes for CDH Version 4.5.0 (CDH4.5.0), click here.

Component

Package Version

Tarball

Release Notes

Changes File

DataFu

pig-udf-datafu-0.0.4+23

Tarball

Release notes

Changes

Apache Flume

flume-ng-1.4.0+56

Tarball

Release notes

Changes

Apache Hadoop

hadoop-2.0.0+1518

Tarball

Release notes

Changes

Apache HBase

hbase-0.94.6+165

Tarball

Release notes

Changes

Apache HCatalog

hcatalog-0.5.0+14

Tarball

Release notes

Changes

Apache Hive

hive-0.10.0+214

Tarball

Release notes

Changes

Hue

hue-2.5.0+182

Tarball

Release notes

Changes

Apache Mahout

mahout-0.7+22

Tarball

Release notes

Changes

Apache Oozie

oozie-3.3.2+97

Tarball

Release notes

Changes

Parquet

parquet-1.2.5+0

Tarball

Release notes

Changes

Apache Pig

pig-0.11.0+36

Tarball

Release notes

Changes

Apache Sentry (incubating)

sentry-1.1.0+16

Tarball

Release notes

Changes

Apache Sqoop

sqoop-1.4.3+81

Tarball

Release notes

Changes

Apache Sqoop2

sqoop2-1.99.2+98

Tarball

Release notes

Changes

Apache Whirr

whirr-0.8.2+14

Tarball

Release notes

Changes

Apache Zookeeper

zookeeper-3.4.5+24

Tarball

Release notes

Changes

Flume 0.9.4 is not included in the CDH4 distribution. It has been replaced by Flume 1.x, and you are encouraged to transition to this new version. However, for a limited time you can access a CDH4-compatible version of Flume 0.9.4 package at:

CDH4 Project

Package Version

Tarball Version

Release Notes

Changes File

Apache Flume

flume-0.9.4+25.52

flume-0.9.4-cdh4.1.3.tar.gz

here

here

What's New in CDH4.5.0

Apache Flume

New Features:

  • FLUME-2190 - Included a new Twitter Source that feeds off the Twitter firehose.
  • FLUME-2109 - HTTP Source now supports HTTPS.
  • FLUME-1666 - Syslog TCP Source can now keep timestamp and process fields in the event body.
  • FLUME-2202 - AsyncHBaseSink can now coalesce increments to the same row and column per transaction to reduce the number of RPC calls.
  • FLUME-2189 - Avro Source can now accept events from a restricted set of peers.
  • FLUME-2052 - Spooling Directory Source can now ignore or replace malformed characters.
  • Flume auto-detects Cloudera Search dependencies.

Changed Feature:

  • FLUME-2233 - Memory Channel calculates byte capacity usage on transaction commits instead of puts to improve performance.

Apache Hive

New Feature:

Hue

New Features:

  • Added support for SAML authentication backend and other security fixes.

Changed Features:

  • HUE-1609 - [core] LDAP backend and import should be case insensitive.
  • HUE-1632 - [oozie] Workflow with & in a property fails to submit.
  • HUE-1555 - [hbase] Python 2.4 support.
  • HUE-1521 - [core] Improve JobTracker HA.
  • [search] Default template should display all the fields.
  • [core] Make search bind authentication optional for LDAP.

Apache MapReduce v1 (MRv1)

New Features:

  • Track HDFS accesses: An MRv1 job keeps track of HDFS tokens used by it for accessing HDFS data whenmapreduce.job.token.tracking.ids is set to true. Further, the HDFS audit logs capture information on jobs accessing data.
  • Stack traces on task-timeout: For easy debugging, MRv1 tasks dump their stack traces on timeout.
  • KeyOnlyTextInputWriter and KeyOnlyTextOutputReader enable streaming jobs to write/read text without separators.

Changed Feature

  • Users no longer need to set environment variables differently when using the scripts under the bin-mapreduce1 directory in MRv1 tarballs.

Apache MapReduce v2 (YARN)

New Features:

  • Track HDFS accesses: A job keeps track of HDFS tokens used by it for accessing HDFS data, whenmapreduce.job.token.tracking.ids is set to true. Further, the HDFS audit logs capture information on jobs accessing data.
  • KeyOnlyTextInputWriter and KeyOnlyTextOutputReader enable streaming jobs to write/read text without separators.
  • The Fair Scheduler can now be configured to decouple decisions from node heartbeats, resulting in faster scheduling.

Apache Oozie

New Feature:

  • The Pig and Hive actions can now access Parquet files with no manual steps or configuration needed.

Apache Sentry (incubating)

New Features:

  • Access to the Hive Metastore Service can be secured without IPTables. To restrict access to the Hive Metastore Service to only the users that HiveServer2 and ImpalaD run as, these users need to be added to core-site.xml.

    In the example below, hivemetastore is the user that Hive Metastore Service runs as. hive and impala are users that HiveServer2 and ImpalaD run as respectively. These users will now be allowed to connect to the Hive Metastore Service.

    <property>
    <name>hadoop.proxyuser.hivemetastore.groups</name>
    <value>hive, impala</value>
    </property>

  • Sentry is now integrated with Cloudera Search. See Configuring Sentry for Search for more information.

CDH 4.x Requirements and Supported Versions

Supported Operating Systems

CDH4 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System

Version

Packages

Red Hat compatible



Red Hat Enterprise Linux (RHEL)

5.7

64-bit


6.2

64-bit, 32-bit

6.4

64-bit

CentOS

5.7

64-bit


6.2

64-bit, 32-bit


6.4

64-bit

Oracle Linux with Unbreakable Enterprise Kernel

5.6

64-bit

6.4

64-bit

SLES



SLES Linux Enterprise Server (SLES)

11 with Service Pack 1 or later

64-bit

Ubuntu/Debian



Ubuntu

Lucid (10.04) - Long-Term Support (LTS)

64-bit


Precise (12.04) - Long-Term Support (LTS)

64-bit

Debian

Squeeze (6.0.3)

64-bit

  Note:
  • For production environments, 64-bit packages are recommended. Except as noted above, CDH4 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera's packages, you can also download source tarballs from Downloads.

Supported Databases

Supported JDK versions

Supported Internet Protocol