ETL CAS

Amazon Redshift Data Connector

This code is also available in: Deutsch Español English
Difficulty Level
Beginner
Published on :
The Amazon Redshift data connector facilitates the integration of data hosted on Amazon Redshift with SAS© Viya's distributed analytics environment. It relies on the CASLIB statement to define the connection and PROC CASUTIL to manipulate the data. Specific options are available to manage authentication (including SSO with Microsoft Entra ID and AWS IAM), bulk loading and unloading (via S3 for increased performance), and data type conversion between Redshift and CAS. Redshift CHAR and VARCHAR data types are handled in terms of bytes, while the CAS server supports VARCHAR and converts integer numeric types to DOUBLE for operations not directly supported by CAS, with a precision of 15 digits.
Data Analysis

Type : EXTERNAL_DATABASE


Examples interact with an external Amazon Redshift database. Redshift table names used in the examples are placeholders and must exist in the user's Redshift environment. For internal SAS data creation, DATA steps are provided.

1 Code Block
CASLIB
Explanation :
This example creates a CAS data library (caslib) named `redshift_lib_basic` that connects to an Amazon Redshift database. It uses `srctype` options to specify the source type, `server`, `username`, `password`, and `database` for connection information. Replace the option values with your own Redshift connection information.
Copied!
1/* Establishes a simple connection to an Amazon Redshift database. */
2caslib redshift_lib_basic desc='Basic Redshift Caslib'
3 dataSource=(srctype='redshift',
4 server='redshift-server.example.com', /* Replace with your Redshift server */
5 username='redshift_user',
6 password='your_redshift_password',
7 database='your_redshift_database');
8 
9/* Verify that the caslib is properly defined */
10PROC CAS; TABLE.caslibinfo caslib='redshift_lib_basic'; RUN; QUIT;
2 Code Block
PROC CASUTIL
Explanation :
This example shows how to load an existing table from Amazon Redshift (`my_redshift_table`) to the CAS server using `PROC CASUTIL`. It also illustrates the ability to override connection options (like username and password) directly within the `dataSourceOptions` of the `LOAD` statement. The table is loaded into the `casuser` caslib under the name `my_cas_table_from_redshift`. `ephem=TRUE` indicates that the CAS table is ephemeral.
Copied!
1/* Create a basic caslib if it doesn't exist */
2caslib redshift_lib_load desc='Redshift Caslib for Loading'
3 dataSource=(srctype='redshift',
4 server='redshift-server.example.com', /* Replace with your Redshift server */
5 username='redshift_user_load',
6 password='your_redshift_password_load',
7 database='your_redshift_database_load');
8 
9/* Load an existing 'my_redshift_table' from Redshift into CAS */
10/* Credentials can be overridden at the procedure level */
11PROC CASUTIL;
12 list files incaslib="redshift_lib_load"; /* Lists available tables in the Redshift caslib */
13
14 load casdata="my_redshift_table" incaslib="redshift_lib_load"
15 outcaslib="casuser" casout="my_cas_table_from_redshift"
16 dataSourceOptions=(
17 username='user_override', /* Override the user */
18 password='password_override' /* Override the password */
19 ) ephem=TRUE; /* ephem=TRUE for a temporary table */
20
21 list files incaslib="casuser"; /* Verifies the presence of the new CAS table */
22 contents casdata="my_cas_table_from_redshift" incaslib="casuser";
23QUIT;
3 Code Block
DATA STEP / PROC CASUTIL Data
Explanation :
This example illustrates bulk loading data from a CAS table (`casuser.sample_data`) to a new table in Amazon Redshift (`sample_redshift_table`).
1. A `sample_data` table is first created in CAS with example data.
2. A `redshift_lib_bulk` caslib is defined to connect to Redshift, specifying `bulkLoad=TRUE` options, `bucket` (for the intermediate S3 bucket), `region`, `deleteDataFiles=TRUE` to clean up S3 after the transfer, and `blFormat='PARQUET'` for the file format used for the transfer.
3. `PROC CASUTIL SAVE` is then used to write the CAS table to Redshift via the bulk load mechanism, which uses S3 as an intermediate step.
Copied!
1/* 1. Create an in-memory SAS table (CAS) for loading */
2DATA casuser.sample_data;
3 LENGTH id 8 name $20 score 8;
4 INPUT id name $ score;
5 DATALINES;
61 John 85
72 Jane 92
83 Mike 78
94 Emily 95
10;
11RUN;
12 
13/* 2. Establish a connection to Redshift with bulk loading options */
14caslib redshift_lib_bulk desc='Redshift Caslib for Bulk Load'
15 dataSource=(srctype='redshift',
16 server='redshift-server.example.com', /* Replace */
17 username='redshift_user_bulk',
18 password='your_redshift_password_bulk',
19 database='your_redshift_database_bulk',
20 bucket='your-s3-bucket-name', /* Replace with your S3 bucket */
21 region='us-east-1', /* Replace with your AWS region */
22 bulkLoad=TRUE, /* Enable bulk loading */
23 deleteDataFiles=TRUE, /* Deletes temporary files on S3 after loading */
24 blFormat='PARQUET' /* Uses Parquet format for S3 transfer */
25 );
26 
27/* 3. Load the CAS table into Amazon Redshift using bulk load */
28PROC CASUTIL;
29 save casdata="sample_data" incaslib="casuser"
30 outcaslib="redshift_lib_bulk" casout="sample_redshift_table"
31 replace; /* replace=TRUE if the Redshift table already exists */
32QUIT;
4 Code Block
CASLIB
Explanation :
This example configures a caslib to connect to Amazon Redshift using Single Sign-On (SSO) with AWS IAM authentication. It uses `authMethod='IAM'` and includes the `clusterId`, `authenticationScope`, and `awsIAMassumeRoleARN` options which are necessary for this type of connection. The option values must be replaced with those from your AWS and Microsoft Entra ID (Azure Active Directory) configuration.
Copied!
1/* Establishes an SSO connection to an Amazon Redshift database */
2caslib redshift_lib_sso desc='Redshift Caslib with SSO'
3 dataSource=(srctype='redshift',
4 server='myserver.redshift.amazonaws.com', /* Replace */
5 username='myuserid', /* Replace with your user ID */
6 database='myuserdb', /* Replace with your Redshift database */
7 region='us-east-1', /* Replace with your AWS region */
8 authMethod='IAM', /* Uses IAM authentication for SSO */
9 clusterId='dbrs1', /* Replace with your Redshift cluster ID */
10 authenticationScope='api://a09827234-93eo-092fjas1-92038498273/awsassumerole', /* Replace */
11 awsIAMassumeRoleARN='arn:aws:iam::129347192347098273:role/myrolename' /* Replace */
12 );
13 
14/* Verify that the caslib is properly defined */
15PROC CAS; TABLE.caslibinfo caslib='redshift_lib_sso'; RUN; QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved


Related Documentation

Aucune documentation spécifique pour cette catégorie.