Amazon Redshift Data Connector

Difficulty Level

Beginner

Published on : 10/01/2020

The Amazon Redshift data connector facilitates the integration of data hosted on Amazon Redshift with SAS^© Viya^™'s distributed analytics environment. It relies on the CASLIB statement to define the connection and PROC CASUTIL to manipulate the data. Specific options are available to manage authentication (including SSO with Microsoft Entra ID and AWS IAM), bulk loading and unloading (via S3 for increased performance), and data type conversion between Redshift and CAS. Redshift CHAR and VARCHAR data types are handled in terms of bytes, while the CAS server supports VARCHAR and converts integer numeric types to DOUBLE for operations not directly supported by CAS, with a precision of 15 digits.

Data Analysis

Type : EXTERNAL_DATABASE

Examples interact with an external Amazon Redshift database. Redshift table names used in the examples are placeholders and must exist in the user's Redshift environment. For internal SAS data creation, DATA steps are provided.

1 Code Block

CASLIB

Explanation :
This example creates a CAS data library (caslib) named `redshift_lib_basic` that connects to an Amazon Redshift database. It uses `srctype` options to specify the source type, `server`, `username`, `password`, and `database` for connection information. Replace the option values with your own Redshift connection information.

Copied!

1	/* Establishes a simple connection to an Amazon Redshift database. */
2	caslib redshift_lib_basic desc='Basic Redshift Caslib'
3	dataSource=(srctype='redshift',
4	server='redshift-server.example.com', /* Replace with your Redshift server */
5	username='redshift_user',
6	password='your_redshift_password',
7	database='your_redshift_database');
8
9	/* Verify that the caslib is properly defined */
10	PROC CAS; TABLE.caslibinfo caslib='redshift_lib_basic'; RUN; QUIT;

2 Code Block

PROC CASUTIL

Explanation :
This example shows how to load an existing table from Amazon Redshift (`my_redshift_table`) to the CAS server using `PROC CASUTIL`. It also illustrates the ability to override connection options (like username and password) directly within the `dataSourceOptions` of the `LOAD` statement. The table is loaded into the `casuser` caslib under the name `my_cas_table_from_redshift`. `ephem=TRUE` indicates that the CAS table is ephemeral.

Copied!

1	/* Create a basic caslib if it doesn't exist */
2	caslib redshift_lib_load desc='Redshift Caslib for Loading'
3	dataSource=(srctype='redshift',
4	server='redshift-server.example.com', /* Replace with your Redshift server */
5	username='redshift_user_load',
6	password='your_redshift_password_load',
7	database='your_redshift_database_load');
8
9	/* Load an existing 'my_redshift_table' from Redshift into CAS */
10	/* Credentials can be overridden at the procedure level */
11	PROC CASUTIL;
12	list files incaslib="redshift_lib_load"; /* Lists available tables in the Redshift caslib */
13
14	load casdata="my_redshift_table" incaslib="redshift_lib_load"
15	outcaslib="casuser" casout="my_cas_table_from_redshift"
16	dataSourceOptions=(
17	username='user_override', /* Override the user */
18	password='password_override' /* Override the password */
19	) ephem=TRUE; /* ephem=TRUE for a temporary table */
20
21	list files incaslib="casuser"; /* Verifies the presence of the new CAS table */
22	contents casdata="my_cas_table_from_redshift" incaslib="casuser";
23	QUIT;

3 Code Block

DATA STEP / PROC CASUTIL Data

Explanation :
This example illustrates bulk loading data from a CAS table (`casuser.sample_data`) to a new table in Amazon Redshift (`sample_redshift_table`).
1. A `sample_data` table is first created in CAS with example data.
2. A `redshift_lib_bulk` caslib is defined to connect to Redshift, specifying `bulkLoad=TRUE` options, `bucket` (for the intermediate S3 bucket), `region`, `deleteDataFiles=TRUE` to clean up S3 after the transfer, and `blFormat='PARQUET'` for the file format used for the transfer.
3. `PROC CASUTIL SAVE` is then used to write the CAS table to Redshift via the bulk load mechanism, which uses S3 as an intermediate step.

Copied!

1	/* 1. Create an in-memory SAS table (CAS) for loading */
2	DATA casuser.sample_data;
3	LENGTH id 8 name $20 score 8;
4	INPUT id name $ score;
5	DATALINES;
6	1 John 85
7	2 Jane 92
8	3 Mike 78
9	4 Emily 95
10	;
11	RUN;
12
13	/* 2. Establish a connection to Redshift with bulk loading options */
14	caslib redshift_lib_bulk desc='Redshift Caslib for Bulk Load'
15	dataSource=(srctype='redshift',
16	server='redshift-server.example.com', /* Replace */
17	username='redshift_user_bulk',
18	password='your_redshift_password_bulk',
19	database='your_redshift_database_bulk',
20	bucket='your-s3-bucket-name', /* Replace with your S3 bucket */
21	region='us-east-1', /* Replace with your AWS region */
22	bulkLoad=TRUE, /* Enable bulk loading */
23	deleteDataFiles=TRUE, /* Deletes temporary files on S3 after loading */
24	blFormat='PARQUET' /* Uses Parquet format for S3 transfer */
25	);
26
27	/* 3. Load the CAS table into Amazon Redshift using bulk load */
28	PROC CASUTIL;
29	save casdata="sample_data" incaslib="casuser"
30	outcaslib="redshift_lib_bulk" casout="sample_redshift_table"
31	replace; /* replace=TRUE if the Redshift table already exists */
32	QUIT;

4 Code Block

CASLIB

Explanation :
This example configures a caslib to connect to Amazon Redshift using Single Sign-On (SSO) with AWS IAM authentication. It uses `authMethod='IAM'` and includes the `clusterId`, `authenticationScope`, and `awsIAMassumeRoleARN` options which are necessary for this type of connection. The option values must be replaced with those from your AWS and Microsoft Entra ID (Azure Active Directory) configuration.

Copied!

1	/* Establishes an SSO connection to an Amazon Redshift database */
2	caslib redshift_lib_sso desc='Redshift Caslib with SSO'
3	dataSource=(srctype='redshift',
4	server='myserver.redshift.amazonaws.com', /* Replace */
5	username='myuserid', /* Replace with your user ID */
6	database='myuserdb', /* Replace with your Redshift database */
7	region='us-east-1', /* Replace with your AWS region */
8	authMethod='IAM', /* Uses IAM authentication for SSO */
9	clusterId='dbrs1', /* Replace with your Redshift cluster ID */
10	authenticationScope='api://a09827234-93eo-092fjas1-92038498273/awsassumerole', /* Replace */
11	awsIAMassumeRoleARN='arn:aws:iam::129347192347098273:role/myrolename' /* Replace */
12	);
13
14	/* Verify that the caslib is properly defined */
15	PROC CAS; TABLE.caslibinfo caslib='redshift_lib_sso'; RUN; QUIT;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Difficulty Level

Published on : 10/01/2020

Data Analysis

1 Code Block

2 Code Block

3 Code Block

4 Code Block

Related Documentation