Published on :
ETL BASE_DE_DONNEES_EXTERNE

Amazon Redshift Data Connector

This code is also available in: Deutsch Español
Awaiting validation
The Amazon Redshift data connector facilitates the integration of data hosted in Amazon Redshift with the distributed analytics environment of SAS© Viya. It relies on the CASLIB statement to define the connection and on PROC CASUTIL to manipulate data. Specific options are available to manage authentication (including SSO with Microsoft Entra ID and AWS IAM), bulk loading and unloading (via S3 for enhanced performance), and data type conversion between Redshift and CAS. Redshift CHAR and VARCHAR data types are handled in terms of bytes, while the CAS server supports VARCHAR and converts integer numeric types to DOUBLE for operations not directly supported by CAS, with 15-digit precision.
Data Analysis

Type : BASE_DE_DONNEES_EXTERNE


The examples interact with an external Amazon Redshift database. The Redshift table names used in the examples are placeholders and must exist in the user's Redshift environment. For internal data creation in SAS, DATA steps are provided.

1 Code Block
CASLIB
Explanation :
This example creates a CAS data library (caslib) named `redshift_lib_basic` that connects to an Amazon Redshift database. It uses the `srctype` options to specify the source type, `server`, `username`, `password`, and `database` for connection information. Replace the option values with your own Redshift connection information.
Copied!
1/* Établit une connexion simple à une base de données Amazon Redshift. */
2caslib redshift_lib_basic desc='Caslib Redshift Basique'
3 dataSource=(srctype='redshift',
4 server='redshift-server.example.com', /* Remplacez par votre serveur Redshift */
5 username='redshift_user',
6 password='your_redshift_password',
7 database='your_redshift_database');
8 
9/* Vérifier que la caslib est bien définie */
10PROC CAS; TABLE.caslibinfo caslib='redshift_lib_basic'; RUN; QUIT;
2 Code Block
PROC CASUTIL
Explanation :
This example shows how to load an existing Amazon Redshift table (`my_redshift_table`) into the CAS server using `PROC CASUTIL`. It also illustrates the ability to override connection options (such as username and password) directly in the `dataSourceOptions` of the `LOAD` statement. The table is loaded into the `casuser` caslib under the name `my_cas_table_from_redshift`. `ephem=TRUE` indicates that the CAS table is ephemeral.
Copied!
1/* Création d'une caslib de base si elle n'existe pas */
2caslib redshift_lib_load desc='Caslib Redshift pour Chargement'
3 dataSource=(srctype='redshift',
4 server='redshift-server.example.com', /* Remplacez par votre serveur Redshift */
5 username='redshift_user_load',
6 password='your_redshift_password_load',
7 database='your_redshift_database_load');
8 
9/* Charger une table 'my_redshift_table' existante de Redshift dans CAS */
10/* Les identifiants peuvent être surchargés au niveau de la procédure */
11PROC CASUTIL;
12 list files incaslib="redshift_lib_load"; /* Liste les tables disponibles dans la caslib Redshift */
13
14 load casdata="my_redshift_table" incaslib="redshift_lib_load"
15 outcaslib="casuser" casout="my_cas_table_from_redshift"
16 dataSourceOptions=(
17 username='user_override', /* Surcharge l'utilisateur */
18 password='password_override' /* Surcharge le mot de passe */
19 ) ephem=TRUE; /* ephem=TRUE pour une table temporaire */
20
21 list files incaslib="casuser"; /* Vérifie la présence de la nouvelle table CAS */
22 contents casdata="my_cas_table_from_redshift" incaslib="casuser";
23QUIT;
3 Code Block
DATA STEP / PROC CASUTIL Data
Explanation :
This example illustrates bulk loading data from a CAS table (`casuser.sample_data`) to a new table in Amazon Redshift (`sample_redshift_table`).
1. First, a `sample_data` table is created in CAS with example data.
2. A `redshift_lib_bulk` caslib is defined to connect to Redshift, specifying the options `bulkLoad=TRUE`, `bucket` (for the intermediate S3 bucket), `region`, `deleteDataFiles=TRUE` to clean S3 after transfer, and `blFormat='PARQUET'` for the file format used in the transfer.
3. `PROC CASUTIL SAVE` is then used to write the CAS table to Redshift via the bulk loading mechanism, which uses S3 as an intermediate step.
Copied!
1/* 1. Créer une table SAS en mémoire (CAS) pour le chargement */
2DATA casuser.sample_data;
3 LENGTH id 8 name $20 score 8;
4 INPUT id name $ score;
5 DATALINES;
61 John 85
72 Jane 92
83 Mike 78
94 Emily 95
10;
11RUN;
12 
13/* 2. Établir une connexion à Redshift avec options de chargement en masse */
14caslib redshift_lib_bulk desc='Caslib Redshift pour Bulk Load'
15 dataSource=(srctype='redshift',
16 server='redshift-server.example.com', /* Remplacez */
17 username='redshift_user_bulk',
18 password='your_redshift_password_bulk',
19 database='your_redshift_database_bulk',
20 bucket='your-s3-bucket-name', /* Remplacez par votre bucket S3 */
21 region='us-east-1', /* Remplacez par votre région AWS */
22 bulkLoad=TRUE, /* Activer le chargement en masse */
23 deleteDataFiles=TRUE, /* Supprime les fichiers temporaires sur S3 après le chargement */
24 blFormat='PARQUET' /* Utilise le format Parquet pour le transfert S3 */
25 );
26 
27/* 3. Charger la table CAS dans Amazon Redshift en utilisant le bulk load */
28PROC CASUTIL;
29 save casdata="sample_data" incaslib="casuser"
30 outcaslib="redshift_lib_bulk" casout="sample_redshift_table"
31 replace; /* replace=TRUE si la table Redshift existe déjà */
32QUIT;
4 Code Block
CASLIB
Explanation :
This example configures a caslib to connect to Amazon Redshift using Single Sign-On (SSO) with AWS IAM authentication. It uses `authMethod='IAM'` and includes the `clusterId`, `authenticationScope`, and `awsIAMassumeRoleARN` options which are necessary for this type of connection. The option values must be replaced with those from your AWS and Microsoft Entra ID (Azure Active Directory) configuration.
Copied!
1/* Établit une connexion SSO à une base de données Amazon Redshift */
2caslib redshift_lib_sso desc='Caslib Redshift avec SSO'
3 dataSource=(srctype='redshift',
4 server='myserver.redshift.amazonaws.com', /* Remplacez */
5 username='myuserid', /* Remplacez par votre ID utilisateur */
6 database='myuserdb', /* Remplacez par votre base de données Redshift */
7 region='us-east-1', /* Remplacez par votre région AWS */
8 authMethod='IAM', /* Utilise l'authentification IAM pour SSO */
9 clusterId='dbrs1', /* Remplacez par l'ID de votre cluster Redshift */
10 authenticationScope='api://a09827234-93eo-092fjas1-92038498273/awsassumerole', /* Remplacez */
11 awsIAMassumeRoleARN='arn:aws:iam::129347192347098273:role/myrolename' /* Remplacez */
12 );
13 
14/* Vérifier que la caslib est bien définie */
15PROC CAS; TABLE.caslibinfo caslib='redshift_lib_sso'; RUN; QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved