Developer Resources: Application Development
Cloudera Development Kit (CDK)
The Cloudera Development Kit, or CDK for short, is a set of libraries, tools, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
- CDK home
- CDK Discussion Forum
- Presentation: Building Apps on Hadoop with the CDK (from InfoQ NYC 2013)
There are a number of connectors available for accessing Hadoop data via ODBC or JDBC.
- Cloudera Connectors for Teradata, Netezza, MicroStrategy, Tableau, and Oracle
- Apache ODBC driver for Apache Hive
- Apache JDBC driver for Apache Hive
- Progress DataDirect Hadoop Apache Hive ODBC Driver
- Simba's Apache Hive ODBC Driver
- Configuring Impala to Work with ODBC
YARN stands for “Yet-Another-Resource-Negotiator”. It provides the daemons and APIs necessary to develop generic distributed applications of any kind (MRv2 being one such application), handles and schedules resource requests (such as memory and CPU) from such applications, and supervises their execution. (Note: Developing a new YARN application is only required if MapReduce, Pig, Hive, Impala, Crunch, etc do not meet your needs.)
Use these REST APIs for integrating Hadoop components with external tools and apps.
|MapReduce||MapReduce Application Master REST APIs|
|Apache Oozie||Oozie Web Services API|
Apache Thrift APIs
The Thrift APIs in HDFS and HBase makes it easier for non-Java applications to access Hadoop data by exposing them as Apache Thrift services, making it easy for any non-JVM language that has Thrift bindings to interact with them.
- Using perl and Thrift to access HDFS (via Gino Ledesma)
- Using the HBase Thrift Gateway from Python (Sample Chapter from HBase in Action)