Published on :
ETL MIXTE

Exploring the BULKLOAD Option with SAS/ACCESS to Hadoop

Awaiting validation
ATTENTION : Ce contenu est DÉSACTIVÉ. Il est invisible pour les visiteurs.
Attention : This code requires administrator privileges.
The script illustrates the use of `PROC APPEND` with the `SAS©/ACCESS to Hadoop` interface. It executes three distinct tests. The first uses `BULKLOAD=YES` for a fast load of the `sashelp.cars` table. The second performs the same operation without `BULKLOAD` for comparison. The third test shows how to load data while specifying an underlying storage format, here `Parquet`, in Hadoop. Each test is followed by a cleanup step via `PROC SQL` to drop the created table, making the script re-executable.
Data Analysis

Type : MIXTE


The data source is the `sashelp.cars` table, which is an internal SAS example table. The destination is an external Hadoop database, connected via a `libname`. The script does not read external data; it writes to it.

1 Code Block
LIBNAME
Explanation :
Defines a connection to a Hadoop server via the SAS/ACCESS to Hadoop interface. It also enables tracing options (`sastrace`) to record detailed information about the interaction with the database in the SAS log.
Copied!
1LIBNAME mycdh hadoop server="quickstart.cloudera" user=cloudera password=cloudera;
2options sastrace=',,,d' sastraceloc=saslog nostsuffix;
3 
2 Code Block
PROC APPEND Data
Explanation :
This block loads data from the `sashelp.cars` table into a new `cars` table on the Hadoop server. The `bulkload=yes` option activates bulk loading mode, optimized for large volume data transfers. The table is then dropped with `PROC SQL` to clean up the environment.
Copied!
1PROC APPEND base=mycdh.cars (bulkload=yes)
2 DATA=sashelp.cars;
3RUN;
4 
5PROC SQL;
6 drop TABLE mycdh.cars;
7QUIT;
3 Code Block
PROC APPEND Data
Explanation :
This block performs the same load as the previous one but without the `bulkload=yes` option. This allows comparison of the performance difference between a standard load (potentially row-by-row) and a bulk load. The table is then dropped.
Copied!
1PROC APPEND base=mycdh.cars
2 DATA=sashelp.cars;
3RUN;
4 
5PROC SQL;
6 drop TABLE mycdh.cars;
7QUIT;
4 Code Block
PROC APPEND Data
Explanation :
This block loads the data again, but uses the `dbcreate_table_opts` option to pass a specific instruction to Hadoop when creating the table. Here, it requests that the table be stored in 'Parquet' file format, a highly performant columnar storage format. The table is finally dropped.
Copied!
1PROC APPEND base=mycdh.cars (dbcreate_table_opts='stored as parquetfile')
2 DATA=sashelp.cars;
3RUN;
4 
5PROC SQL;
6 drop TABLE mycdh.cars;
7QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.