Published on :
ETL SASHELP

CASL and FedSQL Join Examples

This code is also available in: Deutsch Español Français
Awaiting validation
The functional analysis of this SAS© script is broken down into several main steps. Initially, it configures a CAS session and assigns all available 'caslibs'. Then, it prepares two CAS tables (baseball_location and baseball_stats) in the 'casuser' library from the sashelp.baseball table, separating location information from statistics. After defining 'casuser' as the active 'caslib', the script proceeds to illustrate three types of joins. The first uses PROC FEDSQL for a standard 'LEFT JOIN'. The next two are examples of more advanced CASL joins, leveraging the 'dlJoin' actions from the 'deepLearn' set and 'searchJoin' from the 'searchAnalytics' set. Each method creates a new resulting CAS table from the join.
Data Analysis

Type : SASHELP


Source data comes from the standard 'sashelp.baseball' table. Two intermediate CAS tables, 'casuser.baseball_location' and 'casuser.baseball_stats', are created from this source to facilitate join demonstrations. No external data is directly read outside of 'sashelp'.

1 Code Block
CAS setup
Explanation :
This block initializes a CAS session and assigns all available caslibs, allowing access and manipulation of data in CAS memory. The commented line `cas casauto terminate;` suggests an option to terminate an existing CAS session.
Copied!
1/* cas casauto terminate;
2*/
3cas;
4caslib _all_ assign;
5 
2 Code Block
DATA STEP Data
Explanation :
Creation of a new CAS table named 'baseball_location' in the 'casuser' caslib. This table is derived from 'sashelp.baseball', retaining only columns related to player identification and their position/team.
Copied!
1DATA casuser.baseball_location;
2 SET sashelp.baseball;
3 keep name team div division league position;
4RUN;
3 Code Block
DATA STEP Data
Explanation :
Creation of a second CAS table named 'baseball_stats' in the 'casuser' caslib. This table is also derived from 'sashelp.baseball', but all location and team columns are dropped to isolate player statistics.
Copied!
1DATA casuser.baseball_stats;
2 SET sashelp.baseball;
3 drop team div division league position ;
4RUN;
4 Code Block
OPTIONS Statement
Explanation :
Sets 'casuser' as the default active caslib for subsequent CAS operations, thereby simplifying the syntax for referencing tables in this library.
Copied!
1/* set the active CASLIB */
2options caslib=casuser;
3 
5 Code Block
PROC FEDSQL Data
Explanation :
Executes a FedSQL query to perform a LEFT JOIN between the 'baseball_location' and 'baseball_stats' tables on the 'name' column. The result is stored in a new CAS table named 'fedsql'.
Copied!
1/* FedSQL Left join example */
2PROC FEDSQL sessref=casauto;
3 create TABLE fedsql as
4 select distinct a.div, a.division, a.league, a.position, a.team, b.*
5 from baseball_location as a left join
6 baseball_stats as b
7 on a.name=b.name;
8QUIT;
6 Code Block
PROC CAS
Explanation :
Loads the 'searchAnalytics' and 'deepLearn' CASL action sets. These action sets contain advanced functionalities, including specific join actions that will be used in the following blocks.
Copied!
1/* CASL Join Examples */
2PROC CAS;
3LOADACTIONSET 'searchAnalytics';
4LOADACTIONSET 'deepLearn';
5QUIT;
7 Code Block
PROC CAS (deepLearn.dlJoin) Data
Explanation :
Demonstrates the use of the 'dlJoin' action from the 'deepLearn' set to perform a left join. It joins 'baseball_location' with 'baseball_stats' using the 'name' column as an identifier, and creates an output table named 'dlJoin'.
Copied!
1/* Deep Learning Action Set dljon action: Examples */
2/* joinType="APPEND" | "FULL" | "INNER" | "LEFT" | "RIGHT" */
3PROC CAS;
4 DEEPLEARN.dlJoin /
5 joinType="LEFT"
6 annotatedTable={name="baseball_location"}
7 casOut={name="dlJoin", replace=TRUE}
8 id="name"
9 TABLE={name="baseball_stats"};
10 RUN;
11QUIT;
8 Code Block
PROC CAS (searchAnalytics.searchJoin) Data
Explanation :
Illustrates the 'searchJoin' action from the 'searchAnalytics' set for a left join. It joins 'baseball_location' and 'baseball_stats' on the 'name' column, but with a more detailed column specification, including renaming to distinguish the 'name' columns of the left and right tables. The result is saved in the 'searchJoin' table.
Copied!
1/* searchAnalytics Action Set searchJoin action: Examples */
2/* joinType="APPEND" | "FULL" | "INNER" | "LEFT" | "RIGHT" */
3PROC CAS;
4 searchAnalytics.searchJoin /
5 joinType="LEFT"
6 casOut={name="searchJoin", replace=TRUE}
7 leftTable={columns={{isKey=TRUE, name="name"},
8 {name="name",
9 reName="name_left"
10 }
11 }
12 TABLE={name="baseball_location"}
13 }
14 rightTable={columns={{isKey=TRUE, name="name"},
15 {name="name",
16 reName="name_right"
17 }
18 }
19 TABLE={name="baseball_stats"}
20 };
21RUN;
22QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © 2021, SAS Institute Inc., Cary, NC, USA. All Rights Reserved. SPDX-License-Identifier: Apache-2.0