Published on :
ETL BASE_DE_DONNEES_EXTERNE

Amazon Redshift Data Connector

This code is also available in: Deutsch English Español Français
Awaiting validation
The Amazon Redshift data connector facilitates the integration of data hosted on Amazon Redshift with SAS© Viya's distributed analytics environment. It relies on the CASLIB statement to define the connection and PROC CASUTIL to manipulate the data. Specific options are available to manage authentication (including SSO with Microsoft Entra ID and AWS IAM), bulk loading and unloading (via S3 for increased performance), and data type conversion between Redshift and CAS. Redshift CHAR and VARCHAR data types are handled in terms of bytes, while the CAS server supports VARCHAR and converts integer numeric types to DOUBLE for operations not directly supported by CAS, with a precision of 15 digits.
Data Analysis

Type : BASE_DE_DONNEES_EXTERNE


Examples interact with an external Amazon Redshift database. Redshift table names used in the examples are placeholders and must exist in the user's Redshift environment. For internal SAS data creation, DATA steps are provided.

1 Code Block
CASLIB
Explanation :
This example creates a CAS data library (caslib) named `redshift_lib_basic` that connects to an Amazon Redshift database. It uses `srctype` options to specify the source type, `server`, `username`, `password`, and `database` for connection information. Replace the option values with your own Redshift connection information.
Copied!
1/* Établit une connexion simple à une base de données Amazon Redshift. */
2caslib redshift_lib_basic desc='Caslib Redshift Basique'
3 dataSource=(srctype='redshift',
4 server='redshift-server.example.com', /* Remplacez par votre serveur Redshift */
5 username='redshift_user',
6 password='your_redshift_password',
7 database='your_redshift_database');
8 
9/* Vérifier que la caslib est bien définie */
10PROC CAS; TABLE.caslibinfo caslib='redshift_lib_basic'; RUN; QUIT;
2 Code Block
PROC CASUTIL
Explanation :
This example shows how to load an existing table from Amazon Redshift (`my_redshift_table`) to the CAS server using `PROC CASUTIL`. It also illustrates the ability to override connection options (like username and password) directly within the `dataSourceOptions` of the `LOAD` statement. The table is loaded into the `casuser` caslib under the name `my_cas_table_from_redshift`. `ephem=TRUE` indicates that the CAS table is ephemeral.
Copied!
1/* Création d'une caslib de base si elle n'existe pas */
2caslib redshift_lib_load desc='Caslib Redshift pour Chargement'
3 dataSource=(srctype='redshift',
4 server='redshift-server.example.com', /* Remplacez par votre serveur Redshift */
5 username='redshift_user_load',
6 password='your_redshift_password_load',
7 database='your_redshift_database_load');
8 
9/* Charger une table 'my_redshift_table' existante de Redshift dans CAS */
10/* Les identifiants peuvent être surchargés au niveau de la procédure */
11PROC CASUTIL;
12 list files incaslib="redshift_lib_load"; /* Liste les tables disponibles dans la caslib Redshift */
13
14 load casdata="my_redshift_table" incaslib="redshift_lib_load"
15 outcaslib="casuser" casout="my_cas_table_from_redshift"
16 dataSourceOptions=(
17 username='user_override', /* Surcharge l'utilisateur */
18 password='password_override' /* Surcharge le mot de passe */
19 ) ephem=TRUE; /* ephem=TRUE pour une table temporaire */
20
21 list files incaslib="casuser"; /* Vérifie la présence de la nouvelle table CAS */
22 contents casdata="my_cas_table_from_redshift" incaslib="casuser";
23QUIT;
3 Code Block
DATA STEP / PROC CASUTIL Data
Explanation :
This example illustrates bulk loading data from a CAS table (`casuser.sample_data`) to a new table in Amazon Redshift (`sample_redshift_table`).
1. A `sample_data` table is first created in CAS with example data.
2. A `redshift_lib_bulk` caslib is defined to connect to Redshift, specifying `bulkLoad=TRUE` options, `bucket` (for the intermediate S3 bucket), `region`, `deleteDataFiles=TRUE` to clean up S3 after the transfer, and `blFormat='PARQUET'` for the file format used for the transfer.
3. `PROC CASUTIL SAVE` is then used to write the CAS table to Redshift via the bulk load mechanism, which uses S3 as an intermediate step.
Copied!
1/* 1. Créer une table SAS en mémoire (CAS) pour le chargement */
2DATA casuser.sample_data;
3 LENGTH id 8 name $20 score 8;
4 INPUT id name $ score;
5 DATALINES;
61 John 85
72 Jane 92
83 Mike 78
94 Emily 95
10;
11RUN;
12 
13/* 2. Établir une connexion à Redshift avec options de chargement en masse */
14caslib redshift_lib_bulk desc='Caslib Redshift pour Bulk Load'
15 dataSource=(srctype='redshift',
16 server='redshift-server.example.com', /* Remplacez */
17 username='redshift_user_bulk',
18 password='your_redshift_password_bulk',
19 database='your_redshift_database_bulk',
20 bucket='your-s3-bucket-name', /* Remplacez par votre bucket S3 */
21 region='us-east-1', /* Remplacez par votre région AWS */
22 bulkLoad=TRUE, /* Activer le chargement en masse */
23 deleteDataFiles=TRUE, /* Supprime les fichiers temporaires sur S3 après le chargement */
24 blFormat='PARQUET' /* Utilise le format Parquet pour le transfert S3 */
25 );
26 
27/* 3. Charger la table CAS dans Amazon Redshift en utilisant le bulk load */
28PROC CASUTIL;
29 save casdata="sample_data" incaslib="casuser"
30 outcaslib="redshift_lib_bulk" casout="sample_redshift_table"
31 replace; /* replace=TRUE si la table Redshift existe déjà */
32QUIT;
4 Code Block
CASLIB
Explanation :
This example configures a caslib to connect to Amazon Redshift using Single Sign-On (SSO) with AWS IAM authentication. It uses `authMethod='IAM'` and includes the `clusterId`, `authenticationScope`, and `awsIAMassumeRoleARN` options which are necessary for this type of connection. The option values must be replaced with those from your AWS and Microsoft Entra ID (Azure Active Directory) configuration.
Copied!
1/* Établit une connexion SSO à une base de données Amazon Redshift */
2caslib redshift_lib_sso desc='Caslib Redshift avec SSO'
3 dataSource=(srctype='redshift',
4 server='myserver.redshift.amazonaws.com', /* Remplacez */
5 username='myuserid', /* Remplacez par votre ID utilisateur */
6 database='myuserdb', /* Remplacez par votre base de données Redshift */
7 region='us-east-1', /* Remplacez par votre région AWS */
8 authMethod='IAM', /* Utilise l'authentification IAM pour SSO */
9 clusterId='dbrs1', /* Remplacez par l'ID de votre cluster Redshift */
10 authenticationScope='api://a09827234-93eo-092fjas1-92038498273/awsassumerole', /* Remplacez */
11 awsIAMassumeRoleARN='arn:aws:iam::129347192347098273:role/myrolename' /* Remplacez */
12 );
13 
14/* Vérifier que la caslib est bien définie */
15PROC CAS; TABLE.caslibinfo caslib='redshift_lib_sso'; RUN; QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved