Database

Connecting SAS Viya to Cloudera Impala via ODBC

Simon 26/05/2022 4 views

Integrating data from various sources is a crucial step in modern analytical environments. Cloudera Impala tables can be used as a data source to load information into SAS© Viya's Cloud Analytic Services (CAS).

Unlike the Hadoop connector which uses JAR files, the SAS© Data Connector to Impala (included in the SAS©/ACCESS® to Impala interface on SAS© Viya) operates via an ODBC driver.

This article details the technical steps required to install, configure, and validate the connection between SAS© Viya and Cloudera Impala using the UnixODBC driver manager.

Prerequisites and Architecture

To establish this connection, the CAS server controller (UNIX/Linux environment) must have a valid ODBC configuration using the Impala ODBC driver coupled with the UnixODBC manager.

Step 1: Downloading the Components

Two main components are required:

  1. UnixODBC Driver Manager: Open-source software available on its official website. Check the Cloudera ODBC driver installation guide to verify the compatible version.

  2. Cloudera Impala ODBC Driver: To be downloaded from the Cloudera website.

Note: If your environment already has an ODBC manager, you can use it and simply add the Impala configuration to the existing files.

Step 2: Installing the UnixODBC Manager

On the CAS server controller, log in as the root user to extract and compile the software. In this example, the installation is done in the /opt directory.

Terminal (Bash)
user@sas:~$ $ gunzip unixODBC-2.3.4.tar.gz
user@sas:~$ $ tar -xvf unixODBC-2.3.4.tar

Next, configure and compile UnixODBC:

Terminal (Bash)
user@sas:~$ $ cd /opt/unixODBC-2.3.4
user@sas:~$ $ ./configure --prefix=/opt/unixODBC-2.3.4 --disable-gui --disable-drivers
user@sas:~$ $ make

Step 3: Installing the Impala ODBC Driver

Install the RPM driver via a YUM command. By default, the installation is done in /opt/cloudera/impalaodbc.

Terminal (Bash)
user@sas:~$ $ cd /opt
user@sas:~$ $ yum --nogpgcheck localinstall ClouderaImpalaODBC-2.5.35.1006-1.el7.x86

Step 4: Configuring the Link between Impala and UnixODBC

Once both components are installed, it is necessary to reconfigure the UnixODBC driver to include Impala ODBC in its search path.

Terminal (Bash)
user@sas:~$ $ cd /opt/unixODBC-2.3.4
user@sas:~$ $ export LD_LIBRARY_PATH=/opt/unixODBC-2.3.4/lib
user@sas:~$ $ ./configure --prefix=/opt/cloudera/impalaodbc --with-unixodbc=/opt/unixODBC-2.3.4
user@sas:~$ $ make
user@sas:~$ $ make install

Step 5: Updating the odbc.ini and odbcinst.ini files

The Cloudera driver provides sample .ini files located in /opt/cloudera/impalaodbc/Setup/. You can modify these files directly or copy them to another location.

The critical element is updating the odbc.ini file with your Hadoop cluster information (Hostname and Port of the Impala daemon).

Example configuration for a cluster where the daemon runs on the standard port 21050:


/opt/cloudera/impalaodbc/Setup/ RO
# Values for HOST, PORT, KrbFQDN, and KrbServiceName
HOST=sascdh01.race.sas.com
PORT=21050
Database=default
~
~

Step 6: Configuring SAS© Viya Environment Variables

There are two methods to declare the ODBC paths to SAS© Viya, depending on whether you perform this configuration before or after Viya's installation.

Case A: Before installation (vars.yml File) Add the following lines in the CAS_SETTINGS section of the vars.yml file:

vars.yml
ODBCINI=/opt/cloudera/impalaodbc/Setup/odbc.ini ODBCINST=/opt/cloudera/impalaodbc/Setup/odbcinst.ini CLOUDERAIMPALAODBC=/opt/cloudera/impalaodbc/Setup/odbc.ini LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/unixODBC-2.3.4:/opt/cloudera/impalaodbc/lib/64

Case B: After installation (cas.settings File) If SAS© Viya is already installed, manually edit the ~/sas©/viya/home/SASFoundation/cas.settings file on the CAS controller:

~/sas/viya/home/SASFoundation/cas.settings RO
export ODBCINI=/opt/cloudera/impalaodbc/Setup/odbc.ini ;
export ODBCINST=/opt/cloudera/impalaodbc/Setup/odbcinst.ini;
export CLOUDERAIMPALAODBC=/opt/cloudera/impalaodbc/Setup/odbc.ini;
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/unixODBC-2.3.4:/opt/cloudera/impalaodbc/lib/64
export ODBCSYSINI=/opt/cloudera/impalaodbc/Setup/
~
~
Note: The ODBCSYSINI variable points to the directory containing odbcinst.ini.

Case B: After installation (cas.settings File) If SAS© Viya is already installed, manually edit the ~/sas©/viya/home/SASFoundation/cas.settings file on the CAS controller:

~/sas/viya/home/SASFoundation/cas.settings RO
export ODBCINI=/opt/cloudera/impalaodbc/Setup/odbc.ini ;
export ODBCINST=/opt/cloudera/impalaodbc/Setup/odbcinst.ini;
export CLOUDERAIMPALAODBC=/opt/cloudera/impalaodbc/Setup/odbc.ini;
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/unixODBC-2.3.4:/opt/cloudera/impalaodbc/lib/64
export ODBCSYSINI=/opt/cloudera/impalaodbc/Setup/
~
~
Note: The ODBCSYSINI variable points to the directory containing odbcinst.ini.

Validating the Connection

To validate the configuration, run a SAS© program from SAS© Studio. The goal is to create an "impala" type CASLIB and load a table.

1cas mySession sessopts=(messagelevel=all);
2 
3caslib implib datasource=(srctype="impala",
4 username="hadoop",
5 server="sascdh01.race.sas.com",
6 database="default");
7 
8PROC CASUTIL;
9 load casdata="s_heart" incaslib="implib" outcaslib="implib" casout="s_heart" replace;
10 list tables incaslib="implib";
11QUIT;
12 
13cas mySession terminate;

Verification in the Log: The log should indicate the use of the data connector: NOTE: Performing serial LoadTable action using SAS© Data Connector to Impala.

If this message appears and the table is loaded successfully, your connection between SAS© Viya and Cloudera Impala is operational.