Published on :
ETL SASHELP

Example of GROUPBY aggregation with CASL and FedSQL

This code is also available in: Deutsch Español Français
Awaiting validation
The script initializes a CAS environment by configuring a 'CASWORK' libname for the 'casuser' caslib and redirecting single-level tables to this caslib. It then loads the 'sashelp.baseball' dataset into a CAS table named 'casuser.baseball'. A cleanup step removes a previous aggregated table ('casuser.baseball_groupby') if it exists. Two aggregation methods are then illustrated: the first uses 'PROC FEDSQL' to calculate the sum of hits ('nhits') grouped by division ('div') and team ('team') and stores the result in 'baseball_groupby'. The second approach uses the CAS action 'simple.groupBy' via 'PROC CAS' to perform a similar aggregation on the 'baseball' table by summing 'nhits' and grouping by 'DIV' and 'TEAM', with the result stored in 'casl_baseball_groupby'.
Data Analysis

Type : SASHELP


The source dataset is 'sashelp.baseball', a SAS built-in demonstration table. This table is then copied into the CAS engine under 'casuser.baseball' to allow CAS operations.

1 Code Block
CAS Configuration
Explanation :
This block configures access to the Cloud Analytic Services (CAS) engine. It defines a 'CASWORK' libname that points to the 'casuser' caslib, allowing CAS tables to be referenced as SAS tables. The 'options USER = CASWORK;' option redirects the creation of single-level SAS tables to this CAS caslib, ensuring they are created in CAS. Finally, 'caslib _all_ assign;' ensures that all caslibs are enabled and ready for use.
Copied!
1/* CAS Enabled */
2/* CASL GROUPBY Action Example */
3 
4/* LIBNAME using the CAS engine */
5LIBNAME CASWORK cas caslib=casuser;
6 
7/* Changing the default location of all one level named tables */
8/* from SASWORK to CASWORK */
9options USER = CASWORK;
10 
11%put &_sessref_;
12 
13caslib _all_ assign;
2 Code Block
DATA STEP Data
Explanation :
This DATA STEP creates a new CAS table named 'casuser.baseball' by copying the content of the built-in SAS dataset 'sashelp.baseball'. This makes the data available in the CAS environment for subsequent processing.
Copied!
1 
2DATA casuser.baseball;
3SET sashelp.baseball;
4RUN;
5 
3 Code Block
PROC DELETE
Explanation :
This block uses 'PROC DELETE' to remove the 'casuser.baseball_groupby' table if it exists. This is a good cleanup practice before recreating a table with the same name, thus avoiding errors or outdated data.
Copied!
1PROC DELETE DATA=casuser.baseball_groupby;
2RUN;
4 Code Block
PROC FEDSQL Data
Explanation :
This block uses 'PROC FEDSQL' to perform data aggregation directly in CAS. The 'options caslib="casuser"' option sets the default caslib for FEDSQL operations. The SQL query calculates the sum of hits ('nhits') and groups the results by 'div' (division) and 'team' (team), creating a new CAS table named 'baseball_groupby'.
Copied!
1options caslib="casuser";
2PROC FEDSQL sessref=casauto;
3 create TABLE baseball_groupby
4 as
5 (select sum(nhits) as nhits_sum
6 from baseball
7 group BY div, team) ;
8QUIT;
5 Code Block
PROC CAS (Action simple.groupBy) Data
Explanation :
This block uses 'PROC CAS' to interact with the CAS engine and execute a specific CAS action: 'simple.groupBy'. This action performs an aggregation. It takes 'DIV' and 'TEAM' as input variables for grouping ('inputs'), 'nhits' as the weighting/summing variable ('weight'), and 'SUM' as the aggregation function ('aggregator'). The source table is 'baseball' and the result is saved in a new CAS table named 'casl_baseball_groupby', replacing any existing table with the same name.
Copied!
1PROC CAS;
2 SESSION casauto;
3 SIMPLE.groupBy RESULT=r STATUS=s /
4 inputs={"DIV" "TEAM"},
5 weight="nhits",
6 aggregator="SUM",
7 TABLE={name="baseball"},
8 casout={name="casl_baseball_groupby",
9 replace=true};
10RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © 2021, SAS Institute Inc., Cary, NC, USA. All Rights Reserved. SPDX-License-Identifier: Apache-2.0