Cloudera Impala

Open Source, Interactive SQL for Hadoop

Cloudera Impala is the industry’s leading massively parallel processing (MPP) SQL query engine that runs natively in Apache Hadoop. The Apache-licensed, open source Impala project combines modern, scalable parallel database technology with the power of Hadoop, enabling users to directly query data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is designed from the ground up as part of the Hadoop ecosystem and shares the same flexible file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive, Apache Pig and other components of the Hadoop stack.

Now You Have a Choice

Before Impala, if your relational database was at capacity, you may have had no choice but to expand that system to maintain your expectations of performance. If you were using Hadoop to affordably analyze any amount or kind of data, but wanted interactive performance, you had to move that data into a fast relational database. You then had to accept the cost and effort of duplicate storage and data synchronization; accept the rigidity of requiring fixed schemas; accept that when you moved and transformed data you would inevitably leave something behind; accept that your analysis options would be limited in that target database.

Now you have a choice. With Impala, you can:

  • Enable analysts and data scientists to directly interact with any data stored in Hadoop, using their existing business intelligence (BI) tools and skills through an industry-standard SQL interface.
  • Offload self-service business intelligence to Hadoop, relieving the burden on existing analytical databases and reducing your “BI backlog”.

Impala delivers:

  • Performance equivalent to leading MPP databases, and 10-100x faster than Apache Hive/Stinger.
  • Faster time-to-insight than traditional databases by performing interactive analytics directly on data stored in Hadoop without data movement or predefined schemas.
  • Cost savings through reduced data movement, modeling, and storage.
  • More complete analysis of full raw and historical data, without information loss from aggregations or conforming to fixed schemas.
  • Familiarity of existing business intelligence tools and SQL skills to reduce barriers to adoption.
  • Security with Kerberos authentication, and role-based authorization through the Apache Sentry project.
  • Freedom from vendor lock-in through the open source Apache license.

Key Features of Impala

  • Apache-licensed and 100% open source
  • Massively parallel processing (MPP) architecture for performance, with Hadoop scalability
  • Perform interactive analysis on any data stored in HDFS and HBase
  • Built with native Hadoop security: integrated with Kerberos for authentication and Apache Sentry for fine-grained, role-based authorization
  • ANSI-92 SQL support with user-defined functions (UDFs) Supports common Hadoop file formats: text, SequenceFiles, Avro, RCFile, LZO and Parquet
  • Shares workload management, metadata, ODBC driver, SQL syntax and user interface with Apache

Get Support for Impala with Cloudera Enterprise

Cloudera Enterprise is the best way to leverage the power of Impala in production environments. When you deploy Impala as part of Cloudera Enterprise Flex Edition or Data Hub Edition as part of an enterprise data hub, you can rely on our unique ability to support Impala, as well as actively influence the future of the project.

Learn More About Cloudera Enterprise