Published on :
ETL INTERNAL_CREATION

Bulkload - SAS/ACCESS Greenplum Example

This code is also available in: Deutsch Español Français
Awaiting validation
The script begins with a cleanup phase using `PROC DELETE` to remove any existing test tables (`testblkld1`, `testblkld2`) in the `mydblib` library. Next, a working dataset (`work.testblkld`) is created in memory with data specified directly in the script (`cards`). Finally, the `work.testblkld` data is bulk loaded into the Greenplum database using two distinct approaches: the first via a `CREATE TABLE AS SELECT` statement in `PROC SQL` with `BULKLOAD` options enabled directly on the target table, and the second via a `DATA` step which also uses `BULKLOAD` options during the declaration of the target table `mydblib.testblkld2`.
Data Analysis

Type : INTERNAL_CREATION


The source dataset `work.testblkld` is created directly within the SAS script via a `DATA` step and the `cards;` statement to define inline data. The target tables `mydblib.testblkld1` and `mydblib.testblkld2` are external tables in a Greenplum database, but the source data is internal to the script.

1 Code Block
PROC DELETE
Explanation :
These `PROC DELETE` statements are used to delete tables `testblkld1` and `testblkld2` from the `mydblib` library if they exist. This ensures a clean environment for script execution and avoids errors from creating existing tables. `mydblib` is a SAS/ACCESS library connected to an external database.
Copied!
1 
2PROC DELETE
3DATA=mydblib.testblkld1;
4 
5RUN;
6PROC DELETE
7DATA=mydblib.testblkld2;
8 
9RUN;
10 
2 Code Block
DATA STEP Data
Explanation :
This `DATA` step creates a temporary dataset named `work.testblkld`. It contains four variables: `name` (character), `age` (numeric), `sex` (character), and `bdate` (numeric, formatted as a SAS date from an `mmddyy.` format). The data is provided inline via the `cards;` statement, meaning the dataset is created directly from the values specified in the script.
Copied!
1DATA work.testblkld;
2 INPUT name $ age sex $ bdate mmddyy.;
3 CARDS;
4amy 3 f 030185
5bill 12 m 121277
6charlie 35 m 010253
7david 19 m 101469
8elinor 42 f 080845
9pearl 78 f 051222
10vera 96 f 101200
11frank 24 m 092663
12georgia 1 f 040687
13henry 46 m 053042
14joann 27 f 020461
15buddy 66 m 101432
16;
17RUN;
3 Code Block
PROC SQL
Explanation :
This block uses `PROC SQL` to create a new table `testblkld1` in the `mydblib` library (which represents the Greenplum connection). The `BULKLOAD=YES` option enables bulk loading. The `BL_PORT`, `BL_HOST`, and `BL_PROTOCOL="gpfdist"` options are specific parameters for configuring Greenplum's GPFDIST protocol, allowing optimized data transfer. `bl_format='CSV'` specifies the format of the data sent. The table is created by selecting all columns from the `work.testblkld` dataset.
Copied!
1PROC SQL;
2create TABLE mydblib.testblkld1
3 (BULKLOAD=YES
4 BL_PORT=&port
5 BL_HOST=&host
6 BL_PROTOCOL="gpfdist"
7 bl_format='CSV')
8 as select * from work.testblkld;
9QUIT;
4 Code Block
DATA STEP
Explanation :
This block uses a `DATA` step to create the `testblkld2` table in the `mydblib` library. As in the `PROC SQL` example, the `BULKLOAD=YES`, `BL_PORT`, `BL_HOST`, and `BL_PROTOCOL="gpfdist"` options are specified directly in the `DATA` statement for the target table, thus enabling bulk loading to Greenplum. The `set work.testblkld;` statement indicates that the data to be loaded comes from the previously created temporary dataset `work.testblkld`.
Copied!
1DATA mydblib.testblkld2 (
2 BULKLOAD=YES
3 BL_PORT=&port
4 BL_HOST=&host
5 BL_PROTOCOL="gpfdist"
6 );
7 SET work.testblkld;
8RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : SAS SAMPLE LIBRARY, NAME: bulkload.sas, TITLE: Sample Programs, PRODUCT: SAS/ACCESS to Greenplum, SYSTEM: z/OS, UNIX, WINDOWS, REF: SAS/ACCESS 9 for Relational Databases: Reference