Published on :

PROC RANK In-Database with SAS/ACCESS for Hadoop

This code is also available in: Deutsch Español Français
Awaiting validation
Attention : This code requires administrator privileges.
The script begins by establishing a LIBNAME connection to a Hadoop server. It then copies a SASHELP table to Hadoop. The PROC RANK procedure is executed twice: once leveraging in-database processing (default behavior), and once explicitly disabling this feature for Hadoop to show local processing. Finally, the created tables are cleaned up.
Data Analysis

Type : MIXTE


Initial data is taken from the SASHELP library (`sashelp.class`), then copied to an external system (Hadoop) via the 'mycdh' LIBNAME. A temporary table (`work.class_rank`) is also created internally by SAS.

1 Code Block
LIBNAME / DATA STEP Data
Explanation :
This block establishes a connection to a Hadoop server by creating the 'mycdh' LIBNAME with the specified connection information. Then, a DATA step is used to copy the 'sashelp.class' table to this new Hadoop library, making the 'mycdh.class' table available in the distributed file system.
Copied!
1LIBNAME mycdh hadoop server='quickstart.cloudera' user=cloudera password=cloudera schema='default';
2 
3DATA mycdh.class;
4 SET sashelp.class;
5RUN;
2 Code Block
PROC RANK Data
Explanation :
Enables SAS tracing options ('sastrace') to display potential SQL code generation. The 'PROC RANK' procedure is then executed on the 'mycdh.class' table (residing in Hadoop). By default, SAS will attempt to execute this procedure directly in-database for performance optimization, with ranking performed by descending 'weight'.
Copied!
1options sastrace=',,,d' sastraceloc=saslog nostsuffix;
2 
3PROC RANK DATA=mycdh.class out=work.class_rank;
4 BY descending weight;
5RUN;
3 Code Block
PROC RANK Data
Explanation :
Modifies SAS options to explicitly disable in-database SQL generation for Hadoop ('EXCLUDEDB='HADOOP''). The same 'PROC RANK' procedure is executed, but this time, SAS will retrieve data from Hadoop locally to perform the ranking processing on the SAS server.
Copied!
1options sqlgeneration=(DBMS EXCLUDEDB='HADOOP');
2 
3PROC RANK DATA=mycdh.class out=work.class_rank;
4 BY descending weight;
5RUN;
4 Code Block
PROC PRINT
Explanation :
Displays the content of the 'work.class_rank' table generated by the previous RANK procedures. This allows verification of the ranking operation results.
Copied!
1PROC PRINT DATA=work.class_rank;
2RUN;
5 Code Block
PROC SQL
Explanation :
Uses 'PROC SQL' to drop the temporary tables 'work.class_rank' and 'mycdh.class' created during script execution, thus ensuring a clean environment.
Copied!
1PROC SQL;
2 drop TABLE work.class_rank;
3 drop TABLE mycdh.class;
4QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Exercise 06 - SAS3880 / An Insider's Guide to SAS/ACCESS to Hadoop / In-Database PROC Example