CDH 4.6.0

Cloudera’s 100% Open Source Hadoop Platform

CDH is Cloudera's open source software distribution and consists of Apache Hadoop and additional key open source projects to ensure you get the most out of Hadoop and your data.

It is the only Hadoop solution to offer unified querying options (including batch processing, interactive SQL, text search, and machine learning) and necessary enterprise security features (such as role-based access controls).

Please note: CDH requires manual installation from the command line.
For a faster, automated installation download Cloudera Manager.

CDH Packaging and Tarball Information

Each CDH release series is made up of a collection of CDH project packages that are known to work together. The package version numbers of the CDH projects in each CDH release are listed in the following table.
  Note:

To see the details of all the changes and bug-fixes for a given component in a given release, make sure you read the Changes file as well as the Release Notes, following the links in the tables below.


CDH Version 4.6.0 Packaging and Tarballs

To view the overall release notes for CDH Version 4.6.0 (CDH4.6.0), click here.

Component

Package Version

Tarball

Release Notes

Changes File

DataFu

pig-udf-datafu-0.0.4+11

Tarball

Release notes

Changes

Apache Flume

flume-ng-1.4.0+96

Tarball

Release notes

Changes

Apache Hadoop

hadoop-2.0.0+1554

Tarball

Release notes

Changes

Apache HBase

hbase-0.94.15+86

Tarball

Release notes

Changes

Apache HCatalog

hcatalog-0.5.0+13

Tarball

Release notes

Changes

Apache Hive

hive-0.10.0+237

Tarball

Release notes

Changes

Hue

hue-2.5.0+217

Tarball

Release notes

Changes

Apache Mahout

mahout-0.7+15

Tarball

Release notes

Changes

Apache Oozie

oozie-3.3.2+100

Tarball

Release notes

Changes

Parquet

parquet-1.2.5+7

Tarball

Release notes

Changes

Apache Pig

pig-0.11.0+42

Tarball

Release notes

Changes

Apache Sentry (incubating)

sentry-1.1.0+20

Tarball

Release notes

Changes

Apache Sqoop

sqoop-1.4.3+92

Tarball

Release notes

Changes

Apache Sqoop2

sqoop2-1.99.2+99

Tarball

Release notes

Changes

Apache Whirr

whirr-0.8.2+15

Tarball

Release notes

Changes

Apache ZooKeeper

zookeeper-3.4.5+25

Tarball

Release notes

Changes

What's New in CDH4.6.0

The following topics describe new features introduced in CDH4.6.0:


Apache HBase

For CDH4.6.0, HBase has been rebased to 0.94.15, differing from Apache HBase 0.94.15 in the following ways:

  • Reverted HBASE-8521 (cell cannot be overwritten with bulk loaded HFiles) due to the semantics change it introduced with prior-to-CDH4.6.0 HBase.
  • Reverted HBASE-9097 (set HBASE_CLASSPATH before rest of classpath) due to incompatibilities with prior-to-CDH4.6.0 HBase.
  • Reverted HBASE-8352 (change .snapshot to .hbase-snapshot dir) due to incompatibilities with prior-to-CDH4.6.0 HBase.
New Feature:
  • HBASE-9047 - Added ReplicationSyncUp tool to finish replication when cluster is offline.

Apache HDFS

Upstream fixes:

  • HADOOP-10326 - MapReduce jobs cannot access S3 if Kerberos is enabled
  • HDFS-5031 - BlockScanner scans the block multiple times and on restart scans

Apache Flume

New Features:

  • FLUME-2155 - File Channel is now indexed during replay, improving replay speed
  • FLUME-2130 - Syslog UDP source can now handle larger messages
  • FLUME-2217 - Syslog headers can now be optionally added to the message body

Apache Oozie

New Feature:

  • Secure HBase Table Copy between two HBase servers from Oozie now works.

Apache MapReduce v1 (MRv1)

New Features:

  • Fair Scheduler placement policies allow placing jobs into pools based on the secondary group.
  • Combiners allow custom grouping comparators.

Apache MapReduce v2 (YARN)

New Feature:

  • Combiners allow custom grouping comparators.

CDH 4.6.0 Requirements and Supported Versions

Supported Operating Systems

CDH4 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System

Version

Packages

Red Hat compatible



Red Hat Enterprise Linux (RHEL)

5.7

64-bit


6.2

64-bit, 32-bit

6.4

64-bit

CentOS

5.7

64-bit


6.2

64-bit, 32-bit


6.4

64-bit

Oracle Linux with Unbreakable Enterprise Kernel

5.6

64-bit

6.4

64-bit

SLES



SLES Linux Enterprise Server (SLES)

11 with Service Pack 1 or later

64-bit

Ubuntu/Debian



Ubuntu

Lucid (10.04) - Long-Term Support (LTS)

64-bit


Precise (12.04) - Long-Term Support (LTS)

64-bit

Debian

Squeeze (6.0.3)

64-bit

  Note:
  • For production environments, 64-bit packages are recommended. Except as noted above, CDH4 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera's packages, you can also download source tarballs from Downloads.

Supported Databases

Component

MySQL

SQLite

PostgreSQL

Oracle

Derby

Oozie

5.5

8.4

10.2, 11gR2

Default

Flume

Default (for the JDBC Channel only)

Hue

5.5

Default

8.4

11gR2

Hive

5.5

8.4

10.2, 11gR2

Default

Sqoop

See Note 2

 –

See Note 2

See Note 2

Sqoop 2

See Note 3

 –

See Note 3

See Note 3

Default

Notes

  1. Cloudera's recommendations are:
    • For Red Hat and similar systems:
      • Use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries on Red Hat 5 and similar systems.
      • Use MySQL server version 5.1 (or higher) and version 5.1 client shared libraries on Red Hat 6 and similar systems.

      If you use a higher server version than recommended here (for example, if you use 5.5) make sure you install the corresponding client libraries.

    • For SLES systems, use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries.
    • For Ubuntu systems:
      • Use MySQL server version 5.1 (or higher) and version 5.0 client shared libraries on Lucid (10.4).
      • Use MySQL server version 5.5 (or higher) and version 5.0 client shared libraries on Precise (12.04).
  2. For connectivity purposes only, Sqoop supports MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, Teradata 13.1, and Netezza TwinFin 5.0. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  3. Sqoop 2 can transport data to and from MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, and Microsoft SQL Server 2012. The Sqoop 2 repository is supported only on Derby.

Supported JDK versions

CDH4 is supported with Oracle JDK.
  Important: JDK 1.7
As of Cloudera Manager 4.7 and CDH4.4, Cloudera now supports users running applications compiled with Oracle JDK 7 (JDK 1.7), with the following restrictions:
  • All CDH components must be running the same major version (that is, all deployed on JDK 6 or all deployed on JDK 7). For example, you cannot run Hadoop on JDK 6 while running Sqoop on JDK 7.
  • All nodes in the cluster must be running the same major JDK version: Cloudera does not support mixed environments (some nodes on JDK6 and others on JDK7).
To make sure everything works correctly, symbolically link the directory where you install the JDK to /usr/java/default on Red Hat and similar systems, or to /usr/lib/jvm/default-java on Ubuntu and Debian systems.
  • For JDK 1.6, CDH4 is certified with 1.6.0_31, but any later maintenance (_xx) release should be acceptable for production, following Oracle's release notes and restrictions. The minimum supported version is 1.6.0_8.
  • For JDK 1.7, CDH4.2 and later are certified with 1.7.0_15, but any later maintenance (_xx) release should be acceptable for production, following Oracle's release notes and restrictions.

Supported Internet Protocol

CDH requires IPv4. IPv6 is not supported.