SAS and Impala: How to Force Schema Resolution by Name in SQL Pass-Through

When querying Impala tables stored as Parquet files via SAS^©, an insidious issue can arise: data appears in the wrong order, or columns contain incorrect values.

This phenomenon is often due to the way Impala reads metadata from Parquet files. By default, if the physical file structure differs slightly from the table definition (e.g., a different column order), Impala may try to match columns by their position (index) rather than by their name.

To fix this directly in Impala, the option PARQUET_FALLBACK_SCHEMA_RESOLUTION=name is used. But how can this specific configuration be applied when using SQL Pass-Through in SAS^©?

The Problem: Schema Mismatch

When you try to extract a table via SAS^© without this option, you may find that the values do not match the correct columns. This happens because Impala falls back to the column order.

A common attempt is to try to integrate this option directly into the connection string (connect to impala (&impala.)) or to execute the SET command in the wrong place (for example, when creating an intermediate view), which often has no effect.

The Solution: Sequential Execution

The key to solving this problem is to understand that the Pass-Through session must receive the configuration statement before executing the data selection query, but within the same connection block.

You must use the EXECUTE BY IMPALA statement to send the SET command to the database just before running the SELECT.

The Step-by-Step Approach

Open the connection to Impala.
Execute the configuration option to force resolution by name.
Retrieve the data via the established connection.

Corrected Code Example

1	PROC SQL;
2	/* 1. Établir la connexion */
3	connect to impala (&impala.);
4
5	/* 2. Définir l'option de résolution de schéma */
6	execute BY impala (SET PARQUET_FALLBACK_SCHEMA_RESOLUTION=name);
7
8	/* 3. Exécuter la requête de récupération des données */
9	create TABLE SASDATASET as
10	select * from connection to impala
11	(SELECT * FROM db_name.tablename);
12
13	/* Fermeture propre */
14	disconnect from impala;
15	QUIT;

Why does this work? By separating the SET statement, you modify the environment of the active Impala session. When the next query (SELECT *) is sent via connection to impala, it benefits from this setting and correctly maps the Parquet file columns by their name, ensuring the integrity of the data retrieved in SAS^©.

Important Disclaimer

The codes and examples provided on WeAreCAS.eu are for educational purposes. It is imperative not to blindly copy-paste them into your production environments. The best approach is to understand the logic before applying it. We strongly recommend testing these scripts in a test environment (Sandbox/Dev). WeAreCAS accepts no responsibility for any impact or data loss on your systems.

Back to article list

Table of Contents

The Problem: Schema Mismatch

The Solution: Sequential Execution

The Step-by-Step Approach

Corrected Code Example

Important Disclaimer