CDH 5.0.0

Cloudera’s 100% Open Source Hadoop Platform

CDH is Cloudera's open source software distribution and consists of Apache Hadoop and additional key open source projects to ensure you get the most out of Hadoop and your data.

It is the only Hadoop solution to offer unified querying options (including batch processing, interactive SQL, text search, and machine learning) and necessary enterprise security features (such as role-based access controls).

Please note: CDH requires manual installation from the command line.
For a faster, automated installation download Cloudera Manager.

CDH 5.0.0 Packaging and Tarballs

To view the overall release notes for CDH 5, CDH 5 Release Notes.

Component

Package Version

Tarball

Release Notes

Changes File

Apache Avro

avro-1.7.5+cdh5.0.0+16

Tarball

Release notes

Changes

Apache Crunch

crunch-0.9.0+cdh5.0.0+21

Tarball

Release notes

Changes

DataFu

pig-udf-datafu-1.1.0+cdh5.0.0+7

Tarball

Release notes

Changes

Apache Flume

flume-ng-1.4.0+cdh5.0.0+111

Tarball

Release notes

Changes

Apache Hadoop

hadoop-2.3.0+cdh5.0.0+548

Tarball

Release notes

Changes

Apache HBase

hbase-0.96.1.1+cdh5.0.0+60

Tarball

Release notes

Changes

HBase-Solr

hbase-solr-1.3+cdh5.0.0+38

Tarball

Release notes

Changes

Apache Hive

hive-0.12.0+cdh5.0.0+308

Tarball

Release notes

Changes

Hue

hue-3.5.0+cdh5.0.0+365

Tarball

Release notes

Changes

Cloudera Impala

impala-1.3.0+cdh5.0.0+0

(none)

Release notes

Changes

Kite SDK

kite-0.10.0+cdh5.0.0+79

Tarball

Release notes

Changes

Llama

llama-1.0.0+cdh5.0.0+0

Tarball

Release notes

Changes

Apache Mahout

mahout-0.8+cdh5.0.0+27

Tarball

Release notes

Changes

Apache Oozie

oozie-4.0.0+cdh5.0.0+174

Tarball

Release notes

Changes

Parquet

parquet-1.2.5+cdh5.0.0+91

Tarball

Release notes

Changes

Parquet-format

parquet-format-1.0.0+cdh5.0.0+3

Tarball

Release notes

Changes

Apache Pig

pig-0.12.0+cdh5.0.0+27

Tarball

Release notes

Changes

Cloudera Search

search-1.0.0+cdh5.0.0+0

Tarball

Release notes

Changes

Apache Sentry (incubating)

sentry-1.2.0+cdh5.0.0+71

Tarball

Release notes

Changes

Apache Solr

solr-4.4.0+cdh5.0.0+178

Tarball

Release notes

Changes

Apache Spark

spark-0.9.0+cdh5.0.0+31

Tarball

Release notes

Changes

Apache Sqoop

sqoop-1.4.4+cdh5.0.0+43

Tarball

Release notes

Changes

Apache Sqoop2

sqoop2-1.99.3+cdh5.0.0+26

Tarball

Release notes

Changes

Apache Whirr

whirr-0.9.0+cdh5.0.0+4

Tarball

Release notes

Changes

Apache ZooKeeper

zookeeper-3.4.5+cdh5.0.0+28

Tarball

Release notes

Changes

What's New in CDH 5.0.0

The following topics describe new features introduced in CDH 5.0.0.


Apache Hadoop

HDFS

New Features:
  • HDFS-5776- Hedged reads in HDFS for improved HBase MTTR.
  • HDFS-4685- Implementation of extended file access control lists in HDFS.
Notable Bug Fixes:
  • HDFS-5339 - WebHDFS URI does not accept logical nameservices when security is enabled.
  • HDFS-5898 - Allow NFS gateway to login/relogin from its Kerberos keytab.
  • HDFS-5921 - "Browse filesystem" on the Namenode UI doesn't work if any directory has the sticky bit set.
  • HDFS and Hive replication between different Kerberos realms now works.
  • HDFS-5922 - DataNode heartbeat thread can get stuck in a tight loop.

MapReduce & YARN

New Feature:
  • FairScheduler supports moving running applications between queries.
Notable Bug Fixes:
  • Several critical fixes to stabilize ResourceManager HA - Web UI, unmanaged ApplicationMasters and secure-cluster support.
  • Support for large values of mapreduce.task.io.sort.mb.
  • JobHistory Server has information on failed MapReduce jobs.

Apache HBase

New Features:
  • HBASE-10436- Restore RegionServer lists removed from HBase 0.96.0 JMX.

    Many of the metrics exposed in CDH 4/0.94 were removed with the refactorization of metrics in CDH 5/0.96. This patch restores the availability of the lists of live and dead RegionServers. In 0.94 this was a large nested structure as shown below, which included the RegionServer lists and metrics from each region.

    {     
        "name" : "hadoop:service=Master,name=Master",     
        "modelerType" : "org.apache.hadoop.hbase.master.MXBeanImpl",     
        "ZookeeperQuorum" : "localhost:2181",   
    ....    
        "RegionsInTransition" : [ ],     
          "RegionServers" : [ {       
            "key" : "localhost,48346,1390857257246",       
            "value" : {         
              "load" : 2, 
    ....

    CDH 5 Beta 1 and Beta 2 did not contain this list; they only displayed counts of the number of live and dead RegionServers. As of CDH 5.0.0, this list is now presented in a semi-colon separated field as follows:

    {     
        "name" : "Hadoop:service=HBase,name=Master,sub=Server",     
        "modelerType" : "Master,sub=Server",     
        "tag.Context" : "master",     
        "tag.liveRegionServers" : "localhost,56196,1391992019130",     
        "tag.deadRegionServers" :
        "localhost,40010,1391035309673;localhost,41408,1391990380724;localhost,38682,1390950017735",     
        ... 
    }
  • Assorted usability and compatibility improvements as well as improvements to exporting snapshots.

Apache Flume

New Feature:
  • The HBase Sink now supports coalescing multiple Increment RPCs into one (FLUME-2338).
Changed Behavior:
  • File Channel Write timeout has been removed and the configuration parameter is now ignored (FLUME-2307).
  • Syslog UDP source can now accept larger messages (FLUME-2130).
  • AsyncHBase Sink is now fully functional (FLUME-2334).
  • Use standard lookup to find queue/topic in JMS Source (FLUME-2311).
Notable Bug Fixes:
  • Deadlock fixed in Dataset sink (FLUME-2320).
  • FileChannel Dual Checkpoint Backup Thread is now released on application stop (FLUME-2328).
  • Spool Dir source now checks interrupt flag before writing to channel (FLUME-2283).
  • Morphline sink increments eventDrainAttemptCount when it takes event from channel (FLUME-2323).
  • Bucketwriter now permanently closed only on idle and roll timeouts (FLUME-2325).
  • BucketWriter#close now cancels idleFuture (FLUME-2305).

Cloudera Search

CDH 5.x System Requirements:

Supported Operating Systems

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.


Operating System

Version

Packages

Red Hat compatible



Red Hat Enterprise Linux (RHEL)

5.7

64-bit

6.2

64-bit

6.4

64-bit

CentOS

5.7

64-bit


6.2

64-bit


6.4

64-bit

Oracle Linux with Unbreakable Enterprise Kernel

5.6

64-bit

6.4

64-bit

SLES



SLES Linux Enterprise Server (SLES)

11 with Service Pack 1 or later

64-bit

Ubuntu/Debian



Ubuntu

Precise (12.04) - Long-Term Support (LTS)

64-bit

Debian

Wheezy (7.0, 7.1)

64-bit


  Note:
  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera's packages, you can also download source tarballs from Downloads.

Supported JDK Versions

CDH 5 is supported with Oracle JDK 1.7.

Table 1. Supported JDK 1.7 Versions
Latest Certified Version Minimum Supported Version Exceptions
1.7.0_45 1.7.0_25 None

Supported Databases

Component

MySQL

SQLite

PostgreSQL

Oracle

Derby - see Note 4

Oozie

5.5

8.4

10.2, 11gR2

Default

Flume

Default (for the JDBC Channel only)

Hue

5.0+ See Note 1

Default

8.4

11gR2

Hive

5.5

8.4

10.2, 11gR2

Default

Sqoop 1

See Note 2

 –

See Note 2

See Note 2

Sqoop 2

See Note 3

 –

See Note 3

See Note 3

Default

Notes

  1. Cloudera's recommendations are:
    • For Red Hat and similar systems:
      • Use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries on Red Hat 5 and similar systems.
      • Use MySQL server version 5.1 (or higher) and version 5.1 client shared libraries on Red Hat 6 and similar systems.

      If you use a higher server version than recommended here (for example, if you use 5.5) make sure you install the corresponding client libraries.

    • For SLES systems, use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries.
    • For Ubuntu systems:
      • Use MySQL server version 5.5 (or higher) and version 5.0 client shared libraries on Precise (12.04).
  2. For connectivity purposes only, Sqoop 1 supports MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, Teradata 13.1, and Netezza TwinFin 5.0. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  3. Sqoop 2 can transport data to and from MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, and Microsoft SQL Server 2012. The Sqoop 2 repository is supported only on Derby.
  4. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the CDH 5 Installation Guide for recommendations.