Groupon Gets a Great Deal for Its Data in Cloudera’s Distribution for Apache Hadoop


Daily deal giant chooses Cloudera to get maximum insight from its data deluge

Palo Alto, Calif. – April 19, 2011 – Cloudera, the leading provider of Hadoop-based data management software and services, today announced that Groupon, the pioneering daily deal site, has implemented Cloudera’s Distribution for Apache Hadoop (CDH) to get more value from the massive amounts of data and information they collect and generate.

With more than 70 million registered users in more than 500 global markets, Groupon has been dubbed the “fastest growing company ever” by Forbes magazine. Data is one of Groupon’s most strategic assets; Groupon relies on information from both vendors and customers to make daily deal transactions run smoothly. Prior to deploying CDH, Groupon realized that they needed better ways to organize and make sense of the data generated by their massive user base for the long term.

Groupon first approached the Hadoop experts at Cloudera to assist in laying the foundation for a large-scale data system. The goal was to build an IT infrastructure that could keep up with the speedy rate at which Groupon amasses data without impeding the expansion of the business. Groupon worked closely with the Cloudera team to capture their ever-swelling collection of data into Hadoop, take advantage of the ease of scale of the system, and ultimately be prepared for future growth while consistently gaining new insights into its customers and business.

“We were eager to try Hadoop based on the technology’s promise to make sense of massive amounts of data, and it hasn’t disappointed,” said Mark Johnson, chief data officer, Groupon. “Cloudera’s distribution and support have been instrumental in helping Groupon deliver on our goal to be a technology leader.”

Groupon will use Hadoop as a staging area for all of their extreme data. Savvy analysts will be able to go directly to the finest level of detail on data before it has been through the cleansing process. Data that has been refined and processed in Hadoop will go into an analytic DBMS for additional analysis. The company has plans to leverage CDH beyond core Hadoop to include other projects such as Flume, Pig, Hive, Oozie and HBase.

“Cloudera is committed to companies with large amounts of complex data like Groupon by providing the Hadoop-based platform industry-standard in CDH along with unsurpassed Hadoop-related support and services,” said Amr Awadallah, CTO, Cloudera. “Groupon is a perfect example of how enterprises can best make use of Hadoop to get the most insight out of their data and we’re proud to be working with such a trail-blazing company.”

Groupon’s goal of building a world-class infrastructure has encouraged many talented engineers to join their teams in Palo Alto and Chicago. The data team at Groupon is rapidly growing, which is indicative of the heightened interest in data management that both Cloudera and Groupon are seeing in the IT industry.

About Groupon

Groupon, launched in November 2008 in Chicago, features a daily deal on the best stuff to do, eat, see and buy in more than 500 markets around the world. Groupon uses collective buying power to offer unbeatable prices and provide a win-win for businesses and consumers, delivering more than 900 daily deals globally. For more information, visit To learn more on how to become a featured business on Groupon, visit

About Cloudera

Cloudera, the leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. Cloudera's Distribution including Apache Hadoop (CDH), available to download for free at, is the most comprehensive, tested, stable and widely deployed distribution of Hadoop in commercial and non-commercial environments. For the fastest path to reliably using this completely open source technology in production for Big Data analytics and answering previously un-addressable big questions, organizations can subscribe to Cloudera Enterprise, comprised of Cloudera Manager software and Cloudera Support. Cloudera also offers training and certification on Apache technologies, as well as consulting services. As the top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas and gaming, Cloudera's depth of experience and commitment to sharing expertise are unrivaled.

Connect with Cloudera

Read the blog:
Follow on Twitter:
Visit on Facebook: