Published on :
ETL SASHELP

DATA Step Partitioning and Sorting in CAS

This code is also available in: Deutsch Español Français
Awaiting validation
This script configures a CAS session and loads the SASHELP.BASEBALL table into CAS memory. It demonstrates the use of dataset options 'partition=' and 'orderby=' to physically organize data on compute nodes, which optimizes subsequent group processing (BY). A second DATA Step illustrates this usage with a cumulative counter calculation (RETAIN) per team.
Data Analysis

Type : SASHELP


Source data comes from the standard SASHELP library (BASEBALL table).

1 Code Block
CONFIGURATION
Explanation :
CAS connection initialization. Creates a 'CASWORK' libref pointing to the 'casuser' caslib, sets it as the default library (USER), and assigns all visible caslibs.
Copied!
1LIBNAME CASWORK cas caslib=casuser;
2options USER = CASWORK;
3caslib _all_ assign;
4%put &_sessref_;
2 Code Block
DATA STEP Data
Explanation :
Creation of a partitioned and sorted CAS table. The 'partition' option distributes data by division and row order, and 'orderby' ensures sorting within partitions, optimizing future access.
Copied!
1DATA caswork.baseball(partition=(div row_order) orderby=(div row_order));
2 SET sashelp.baseball;
3 row_order = _n_;
4RUN;
3 Code Block
DATA STEP Data
Explanation :
Data processing by group (BY DIV TEAM). Calculates a sequential counter for each team using the RETAIN statement and the automatic variable first.team.
Copied!
1DATA caswork.baseball2;
2 SET caswork.baseball;
3 retain count;
4 BY DIV TEAM;
5 IF first.team THEN
6 count=0;
7 count+1;
8RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © 2021, SAS Institute Inc., Cary, NC, USA. All Rights Reserved. SPDX-License-Identifier: Apache-2.0