This is the documentation for Cloudera Impala 1.2.4.
Documentation for other versions is available at Cloudera Documentation.

Incompatible Changes in Impala

Impala 1.2.4 contains the following incompatible changes. These are things such as file format changes, removed features, or changes to implementation, default configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.

Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns whose names conflict with the new keywords. See Appendix C - Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Continue reading:

Incompatible Changes Introduced in Cloudera Impala 1.2.4

There are no incompatible changes introduced in Impala 1.2.4.

Previously, after creating a table in Hive, you had to issue the INVALIDATE METADATA statement with no table name, a potentially expensive operation on clusters with many databases, tables, and partitions. Starting in Impala 1.2.4, you can issue the statement INVALIDATE METADATA table_name for a table newly created through Hive. Loading the metadata for only this one table is faster and involves less network overhead. Therefore, you might revisit your setup DDL scripts to add the table name to INVALIDATE METADATA statements, in cases where you create and populate the tables through Hive before querying them through Impala.

Incompatible Changes Introduced in Cloudera Impala 1.2.3

Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible changes. See Incompatible Changes Introduced in Cloudera Impala 1.2.2 if you are upgrading from Impala 1.2.1 or 1.1.x.

Incompatible Changes Introduced in Cloudera Impala 1.2.2

The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code, or schema objects such as tables or views:

  • With the addition of the CROSS JOIN keyword, you might need to rewrite any queries that refer to a table named CROSS or use the name CROSS as a table alias:

    -- Formerly, 'cross' in this query was an alias for t1
    -- and it was a normal join query.
    -- In 1.2.2 and higher, CROSS JOIN is a keyword, so 'cross'
    -- is not interpreted as a table alias, and the query
    -- uses the special CROSS JOIN processing rather than a
    -- regular join.
    select * from t1 cross join t2...
    
    -- Now if CROSS is used in other context such as a table or column name,
    -- use backticks to escape it.
    create table `cross` (x int);
    select * from `cross`;
  • Formerly, a DROP DATABASE statement in Impala would not remove the top-level HDFS directory for that database. The DROP DATABASE has been enhanced to remove that directory. (You still need to drop all the tables inside the database first; this change only applies to the top-level directory for the entire database.)

  • The keyword PARQUET is introduced as a synonym for PARQUETFILE in the CREATE TABLE and ALTER TABLE statements, because that is the common name for the file format. (As opposed to SequenceFile and RCFile where the "File" suffix is part of the name.) Documentation examples have been changed to prefer the new shorter keyword. The PARQUETFILE keyword is still available for backward compatibility with older Impala versions.
  • New overloads are available for several operators and built-in functions, allowing you to insert their result values into smaller numeric columns such as INT, SMALLINT, TINYINT, and FLOAT without using a CAST() call. If you remove the CAST() calls from INSERT statements, those statements might not work with earlier versions of Impala.

Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read Incompatible Changes Introduced in Cloudera Impala 1.2.1 for things to note about upgrading to Impala 1.2.x in general.

In a Cloudera Manager environment, the new catalog service is not recognized or managed by Cloudera Manager versions prior to 4.8. Cloudera Manager 4.8 and higher require the catalog service to be present for Impala. Therefore, if you upgrade to Cloudera Manager 4.8 or higher, you must also upgrade Impala to 1.2.1 or higher. Likewise, if you upgrade Impala to 1.2.1 or higher, you must also upgrade Cloudera Manager to 4.8 or higher.

Incompatible Changes Introduced in Cloudera Impala 1.2.1

The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code, or schema objects such as tables or views:

  • In Impala 1.2.1 and higher, all NULL values come at the end of the result set for ORDER BY ... ASC queries, and at the beginning of the result set for ORDER BY ... DESC queries. In effect, NULL is considered greater than all other values for sorting purposes. The original Impala behavior always put NULL values at the end, even for ORDER BY ... DESC queries. The new behavior in Impala 1.2.1 makes Impala more compatible with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting behavior for NULL by adding the clause NULLS FIRST or NULLS LAST at the end of the ORDER BY clause.

    See NULL for more information.

Impala 1.2.1 goes along with CDH 4.5 and Cloudera Manager 4.8. If you used the beta version Impala 1.2.0 that came with the beta of CDH 5, Impala 1.2.1 includes all the features of Impala 1.2.0 except for resource management, which relies on the YARN framework from CDH 5.

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

  • See Installing Impala, Upgrading Impala and Starting Impala, for usage information for the catalogd daemon.

  • The REFRESH and INVALIDATE METADATA statements are no longer needed when the CREATE TABLE, INSERT, or other table-changing or data-changing operation is performed through Impala. These statements are still needed if such operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the statements only need to be issued on one Impala node rather than on all nodes. See REFRESH Statement and INVALIDATE METADATA Statement for the latest usage information for those statements.

  • See The Impala Catalog Service for background information on the catalogd service.

In a Cloudera Manager environment, the new catalog service is not recognized or managed by Cloudera Manager versions prior to 4.8. Cloudera Manager 4.8 and higher require the catalog service to be present for Impala. Therefore, if you upgrade to Cloudera Manager 4.8 or higher, you must also upgrade Impala to 1.2.1 or higher. Likewise, if you upgrade Impala to 1.2.1 or higher, you must also upgrade Cloudera Manager to 4.8 or higher.

Incompatible Changes Introduced in Cloudera Impala 1.2.0 (Beta)

There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).

Because Impala 1.2.0 is bundled with the CDH 5 beta download and depends on specific levels of Apache Hadoop components supplied with CDH 5, you can only install it in combination with the CDH 5 beta.

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

  • See Installing Impala, Upgrading Impala and Starting Impala, for usage information for the catalogd daemon.

  • The REFRESH and INVALIDATE METADATA statements are no longer needed when the CREATE TABLE, INSERT, or other table-changing or data-changing operation is performed through Impala. These statements are still needed if such operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the statements only need to be issued on one Impala node rather than on all nodes. See REFRESH Statement and INVALIDATE METADATA Statement for the latest usage information for those statements.

  • See The Impala Catalog Service for background information on the catalogd service.

The new resource management feature interacts with both YARN and Llama services, which are available in CDH 5. These services are set up for you automatically in a Cloudera Manager (CM) environment. For information about setting up the YARN and Llama services, see the instructions for YARN and Llama in the CDH 5 Installation Guide. See Using YARN Resource Management with Impala (CDH 5 Only) for usage information for Impala resource management.

Incompatible Changes Introduced in Cloudera Impala 1.1.1

There are no incompatible changes in Impala 1.1.1.

Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires updating the table metadata. Use the following command if you are already running Impala 1.1.1:

ALTER TABLE table_name SET FILEFORMAT PARQUETFILE;

If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:

ALTER TABLE table_name SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
ALTER TABLE table_name SET FILEFORMAT
  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";

Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.

As usual, make sure to upgrade the impala-lzo-cdh4 package to the latest level at the same time as you upgrade the Impala server.

Incompatible Change Introduced in Cloudera Impala 1.1

  • The REFRESH statement now requires a table name; in Impala 1.0, the table name was optional. This syntax change is part of the internal rework to make REFRESH a true Impala SQL statement so that it can be called through the JDBC and ODBC APIs. REFRESH now reloads the metadata immediately, rather than marking it for update the next time any affected table is accessed. The previous behavior, where omitting the table name caused a refresh of the entire Impala metadata catalog, is available through the new INVALIDATE METADATA statement. INVALIDATE METADATA can be specified with a table name to affect a single table, or without a table name to affect the entire metadata catalog; the relevant metadata is reloaded the next time it is requested during the processing for a SQL statement. See REFRESH Statement and INVALIDATE METADATA Statement for the latest details about these statements.

Incompatible Changes Introduced in Cloudera Impala 1.0

  • If you use LZO-compressed text files, when you upgrade Impala to version 1.0, also update the impala-lzo-cdh4 to the latest level. See Using LZO-Compressed Text Files for details.
  • Cloudera Manager 4.5.2 and higher only supports Impala 1.0 and higher, and vice versa. If you upgrade to Impala 1.0 or higher managed by Cloudera Manager, you must also upgrade Cloudera Manager to version 4.5.2 or higher. If you upgrade from an earlier version of Cloudera Manager, and were using Impala, you must also upgrade Impala to version 1.0 or higher. The beta versions of Impala are no longer supported as of the release of Impala 1.0.

Incompatible Change Introduced in Version 0.7 of the Cloudera Impala Beta Release

  • The defaults for the -nn and -nn_port flags have changed and are now read from core-site.xml. Impala prints the values of -nn and -nn_port to the log when it starts. The ability to set -nn and -nn_port on the command line is deprecated in 0.7 and may be removed in Impala 0.8.

Incompatible Change Introduced in Version 0.6 of the Cloudera Impala Beta Release

  • Cloudera Manager 4.5 supports only version 0.6 of the Cloudera Impala Beta Release. It does not support the earlier beta versions. If you upgrade your Cloudera Manager installation, you must also upgrade Impala to beta version 0.6. If you upgrade Impala to beta version 0.6, you must upgrade Cloudera Manager to 4.5.

Incompatible Change Introduced in Version 0.4 of the Cloudera Impala Beta Release

  • Cloudera Manager 4.1.3 supports only version 0.4 of the Cloudera Impala Beta Release. It does not support the earlier beta versions. If you upgrade your Cloudera Manager installation, you must also upgrade Impala to beta version 0.4. If you upgrade Impala to beta version 0.4, you must upgrade Cloudera Manager to 4.1.3.

Incompatible Change Introduced in Version 0.3 of the Cloudera Impala Beta Release

  • Cloudera Manager 4.1.2 supports only version 0.3 of the Cloudera Impala Beta Release. It does not support the earlier beta versions. If you upgrade your Cloudera Manager installation, you must also upgrade Impala to beta version 0.3. If you upgrade Impala to beta version 0.3, you must upgrade Cloudera Manager to 4.1.2.