This is the documentation for Cloudera Impala 1.4.0.
Documentation for other versions is available at Cloudera.com.

Known Issues and Workarounds in Impala

The following sections describe known issues and workarounds in Impala.

For issues fixed in various Impala releases, see Fixed Issues in Impala .

Continue reading:

Known Issues in the Current Production Release (1.4.0)

These known issues affect the current release. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.

Continue reading:

Impala Alternatives must be added back after RPM upgrade

After upgrading Impala RPMs, the alternatives symbolic links are deleted. These symlinks, which configure the configuration files and executables that are used, have to be added back before Impala can start.

Severity: High

Resolution: Apply workaround when upgrading. The problem is fixed in CDH 5.0.3, but not the equivalent Impala 1.3.1 for CDH 4. It is also fixed in CDH 5.1, and Impala 1.4.0 for CDH 4.
  Note: Users who upgrade to a fixed version will still have to do this once, but future upgrades will then work.

Workaround: Execute the following commands after upgrading the RPMs, then start the services again:

alternatives --install /etc/impala/conf impala-conf /etc/impala/conf.dist      30
alternatives --install /usr/lib/impala/sbin impala /usr/lib/impala/sbin-retail 20
alternatives --install /usr/lib/impala/sbin impala /usr/lib/impala/sbin-debug  10

If you have installed the impala-udf-devel package, also replace these alternatives:

alternatives --install /usr/lib64/libImpalaUdf.a libImpalaUdf /usr/lib64/libImpalaUdf-retail.a 20
alternatives --install /usr/lib64/libImpalaUdf.a libImpalaUdf /usr/lib64/libImpalaUdf-debug.a  10

On SLES, issue the command update-alternatives rather than alternatives.

If you have previously manually activated one of the symlinks (as opposed to just going with the default priorities), you will need to repeat that selection.

DECIMAL type not supported on CDH 4

When the CREATE TABLE statement creates a table on CDH 4 containing a DECIMAL column, a warning message is intended to be displayed that the table might not be compatible with other CDH 4 components. The message is not present in the initial Impala 1.4.0 release.

Severity: Low

Workaround: Use the DECIMAL data type only on CDH 5.1 and higher. This data type is not supported on CDH 4. See DECIMAL Data Type (CDH 5 Only) for details.

Cancelled query occasionally returns with OK query status (expected status "Cancelled")

Intermittently, a query that was cancelled returns with Query Status=OK instead of Query Status=Cancelled as expected. The Impala logs show the correct cancellation status.

Bug: IMPALA-1047

Severity: Minor

ORDER BY rand() does no work.

Because the value for rand() is computed early in a query, using an ORDER BY expression involving a call to rand() does not actually randomize the results.

Bug: IMPALA-397

Severity: High

Loading metadata for an extremely wide table (10k+ columns) takes too long

The first access to a table could take substantial time if the table has thousands of columns.

Bug: IMPALA-428

Severity: Minor

Workaround: Use tables with fewer columns, and join where necessary.

Impala BE cannot parse Avro schema that contains a trailing semi-colon

If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.

Bug: IMPALA-1024

Severity: High

Process mem limit does not account for the JVM's memory usage

Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.

Bug: IMPALA-691

Severity: High

Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.

Impala Parser issue when using fully qualified table names that start with a number.

A fully qualified table name starting with a number could cause a parsing error. In a name such as db.571_market, the decimal point followed by digits is interpreted as a floating-point number.

Bug: IMPALA-941

Severity: High

Workaround: Surround each part of the fully qualified name with backticks (``).

CatalogServer should not require HBase to be up to reload its metadata

If HBase is unavailable during Impala startup or after an INVALIDATE METADATA statement, the catalogd daemon could go into an error loop, making Impala unresponsive.

Bug: IMPALA-788

Severity: High

Workaround: For systems not managed by Cloudera Manager, add the following settings to /etc/impala/conf/hbase-site.xml:

<property>
  <name>hbase.client.retries.number</name>
  <value>3</value>
</property>
<property>
  <name>hbase.rpc.timeout</name>
  <value>3000</value>
</property>

Currently, Cloudera Manager does not have an Impala-only override for HBase settings, so any HBase configuration change you make through Cloudera Manager would take affect for all HBase applications. Therefore, this change is not recommended on systems managed by Cloudera Manager.

Excessively long query plan serialization time in FE when querying huge tables

For tables with many different HDFS data blocks, due to number of files or number of partitions, the overall query time could be slower than necessary because of overhead in analyzing the table metadata.

Bug: IMPALA-958

Severity: High

Impala cannot read data written by using the LazyBinaryColumnarSerDe

The addition of a new Hive SerDe LazyBinaryColumnarSerDe for RCFile data means that RCFile tables created in Hive 0.12 could be unreadable by Impala or Impala queries could return incorrect results. The symptoms of the issue could include unexpected NULL values, error messages about incorrect conversion, or more serious errors due to the unexpected binary data format.

Bug: IMPALA-781

Severity: High

Workaround:
  • If you use the CREATE TABLE ... STORED AS RCFILE statement in Impala, you will sidestep this problem. (The Impala CREATE TABLE statement always creates a table with an Impala-compatible SerDe.)
  • Most levels of CDH that you would use with Impala come with earlier levels of Hive that do not write these incompatible files. This issue could occur with the CDH 5 beta, which does include Hive 0.12 but keeps the original default SerDe for RCFile tables. This issue is more likely to occur with files created in other Hadoop distributions, which might use this new SerDe by default for RCFiles.
  • If you have a problematic Hive table, create one with a similar structure using a file format or an RCFile SerDe that Impala can read, and use Hive to copy the data into the new table. If you create the new table in Hive, use the ColumnarSerDe rather than LazyBinaryColumnarSerDe. If you create the new table with the Impala syntax CREATE TABLE ... STORED AS RCFILE, Impala automatically uses compatible properties for the table.

Kerberos tickets must be renewable

In a Kerberos environment, the impalad daemon might not start if Kerberos tickets are not renewable.

Workaround: Configure your KDC to allow tickets to be renewed, and configure krb5.conf to request renewable tickets.

Avro Scanner fails to parse some schemas

Querying certain Avro tables could cause a crash or return no rows, even though Impala could DESCRIBE the table.

Bug: IMPALA-635

Severity: High

Workaround: Swap the order of the fields in the schema specification. For example, ["null", "string"] instead of ["string", "null"].

Resolution: Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the crashing issue is resolved.

Configuration needed for Flume to be compatible with Impala

For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat must be set to Text, rather than its default value of Writable. The hdfs.writeFormat setting must be changed to Text before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.

Severity: High

Resolution: This information has been requested to be added to the upstream Flume documentation.

Impala does not support running on clusters with federated namespaces

Impala does not support running on clusters with federated namespaces. The impalad process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs class.

Bug: IMPALA-77

Severity: Undetermined

Anticipated Resolution: Limitation

Workaround: Use standard HDFS on all Impala nodes.

Impala INSERT OVERWRITE ... SELECT behavior differs from Hive in that partitions are only deleted/re-written if the SELECT statement returns data.

Impala INSERT OVERWRITE ... SELECT behavior differs from Hive in that the partitions are only deleted or rewritten if the SELECT statement returns data. Hive always deletes the data.

Bug: IMPALA-89

Severity: Medium

Workaround: None

Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)

Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum allowed value of type (Hive returns NULL).

Severity: Low

Workaround: None

Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.

Severity: Low

Anticipated Resolution: None

Workaround: Use explicit casts.

If Hue and Impala are installed on the same host, and if you configure Hue Beeswax in CDH 4.1 to execute Impala queries, Beeswax cannot list Hive tables and shows an error on Beeswax startup.

Hue requires Beeswaxd to be running in order to list the Hive tables. Because of a port conflict bug in Hue in CDH4.1, when Hue and Impala are installed on the same host, an error page is displayed when you start the Beeswax application, and when you open the Tables page in Beeswax.

Severity: High

Anticipated Resolution: Fixed in an upcoming CDH4 release

Workarounds: Choose one of the following workarounds (but only one):

  • Install Hue and Impala on different hosts. OR
  • Upgrade to CDH4.1.2 and add the following property in the beeswax section of the /etc/hue/hue.ini configuration file:
    beeswax_meta_server_only=9004

OR

  • If you are using CDH4.1.1 and you want to install Hue and Impala on the same host, change the code in this file:
    /usr/share/hue/apps/beeswax/src/beeswax/management/commands/beeswax_server.py

    Replace line 66:

    str(beeswax.conf.BEESWAX_SERVER_PORT.get()),

    With this line:

    '8004',

    Beeswaxd will then use port 8004.

      Note:

    If you used Cloudera Manager to install Impala, refer to the Cloudera Manager release notes for information about using an equivalent workaround by specifying the beeswax_meta_server_only=9004 configuration value in the options field for Hue. In Cloudera Manager 4, these fields are labelled Safety Valve; in Cloudera Manager 5, they are called Advanced Configuration Snippet.

Impala should tolerate bad locale settings

If the LC_* environment variables specify an unsupported locale, Impala does not start.

Bug: IMPALA-532

Severity: Low

Workaround: Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.

Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.

Log Level 3 Not Recommended for Impala

The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.

Severity: Low

Workaround: Reduce the log level to its default value of 1, that is, GLOG_v=1. See Setting Logging Levels for details about the effects of setting different logging levels.