New to Hadoop
Knowing where to start can be difficult, but this curated list of resources will help.
1. Read Up on Background
Getting a bit of background information first is always a good idea.
- Hadoop: What It Is, How It Works, and What it Can Do
- Hadoop FAQ - Getting Started Version
- MOOC: Intro to Hadoop and MapReduce
- Cloudera Glossary
- Video: Overview of Hadoop Platform Components
- Presentation: Apache Hadoop in Theory and Practice (via Adam Kawa)
CDH is Cloudera's 100% open-source, enterprise-ready distro of Apache Hadoop and related projects. Install it directly for the best Hadoop experience, or test-drive it on demand via Cloudera Live, or in a VM, first.
- Explore CDH components (project homepages, docs, blogs, Q&A, downloads)
- Try Cloudera Live - Hadoop on demand
- Download the QuickStart VM
- How-to: Install CDH and Impala on EC2 using Cloudera Manager Free Edition
- How-to: Create a Hadoop Cluster POC using CDH on EC2 (via Randy Zwitch)
It's the quickest way to become dangerous.
- See Hadoop Tutorial
- See all How-to's
- See Online Learning
- Read Tom White's "How to Hadoop" series in Dr. Dobb's
When "being dangerous" isn't good enough, it's time to train with Cloudera University.
Reading books, or at least keeping them around for reference, is the best way to progressively deepen your knowledge.
- Hadoop, The Definitive Guide - by Tom White
- Hadoop Operations - by Eric Sammer
- HBase, The Definitive Guide - by Lars George
- Apache Sqoop Cookbook - by Kathleen Ting & Jarek Cecho
- HBase in Action - by Nick Dimiduk & Amandeep Khurana
- Cloudera Impala (e-book) - by John Russell
Make an impact on the quality and direction of the Hadoop stack - by reporting bugs and/or becoming an active contributor to a project.
- See the How to Contribute page