ETL, Data Transformation SAS VIYA CAS

PROC SQL Reimagined: How to Run Complex Joins in SAS Viya with fedSql.execDirect

This code is also available in: Deutsch Español
Difficulty Level
Beginner
Published on :
Michael

Expert Advice

Michael
Responsable de l'infrastructure Viya.

In standard SQL, replacing a table usually involves a DROP TABLE IF EXISTS statement followed by a CREATE TABLE. In SAS Viya's FedSQL, you can streamline this using the table options syntax directly within the creation statement. As shown in your code, CREATE TABLE MyTable {options replace=true} AS SELECT... handles the overwrite logic atomically. This is cleaner and prevents errors where a drop might fail if the table doesn't exist yet.

The `fedSql.execDirect` action is a powerful tool in SAS© Viya for executing SQL queries directly on data loaded into Cloud Analytic Services (CAS). This example details the process of loading raw data, joining it using standard SQL clauses (`JOIN`, `USING`), and then applying filtering (`WHERE`), aggregation (`GROUP BY`, `HAVING`), and sorting (`ORDER BY`) to analyze the data. The ability to create new tables in CAS memory from complex queries is highlighted, as well as the interaction with other CAS table actions like `table.fetch`.
Data Analysis

Type : INTERNAL_CREATION


The examples use internally generated data (datalines) for the 'crime_data' and 'census_data' tables to make each code block self-contained and executable.

1 Code Block
DATA STEP / FEDSQL.EXECDIRECT Data
Explanation :
This code creates two temporary CAS tables, `crime_data` and `census_data`, from inline data. It then uses `fedsql.execdirect` to perform a join (`JOIN`) on the `community_area` column to combine information from both tables into a new table called `CrimeCensusBasique`. Finally, `table.fetch` is used to display the first five rows of the resulting table.
Copied!
1* Connexion à CAS et chargement des données de démonstration;
2options cas convenient;
3caslib _all_ assign;
4 
5* Création de données de démonstration pour CRIME;
6DATA mycas.crime_data (promote=yes);
7 INFILE DATALINES dsd;
8 INPUT primary_type $ date :yymmdd10. community_area arrest $;
9 FORMAT date yymmdd10.;
10 DATALINES;
11"THEFT",2015-03-15,10,"FALSE"
12"BATTERY",2015-06-20,10,"TRUE"
13"THEFT",2015-04-01,15,"FALSE"
14"ASSAULT",2016-01-10,20,"TRUE"
15"THEFT",2015-08-01,10,"FALSE"
16"THEFT",2015-09-05,15,"TRUE"
17"THEFT",2015-10-10,20,"FALSE"
18"BATTERY",2016-02-01,10,"TRUE"
19"THEFT",2015-11-20,15,"FALSE"
20"ASSAULT",2015-12-25,20,"TRUE"
21"THEFT",2016-01-05,10,"FALSE"
22"THEFT",2016-01-15,10,"FALSE"
23"THEFT",2016-01-20,10,"FALSE"
24"THEFT",2016-01-25,10,"FALSE"
25"THEFT",2016-01-30,10,"FALSE"
26"THEFT",2016-01-01,15,"FALSE"
27"THEFT",2016-01-02,15,"FALSE"
28"THEFT",2016-01-03,15,"FALSE"
29"THEFT",2016-01-04,15,"FALSE"
30"THEFT",2016-01-05,15,"FALSE"
31;
32RUN;
33 
34* Création de données de démonstration pour CENSUS;
35DATA mycas.census_data (promote=yes);
36 INFILE DATALINES dsd;
37 INPUT community_area hardship_index per_capita_income;
38 DATALINES;
3910,50,30000
4015,70,20000
4120,30,45000
4225,90,15000
43;
44RUN;
45 
46PROC CAS;
47 fedsql.execdirect /
48 query="CREATE TABLE CrimeCensusBasique {options replace=true} AS
49 SELECT T1.primary_type, T1.date, T1.arrest,
50 T2.hardship_index, T2.per_capita_income, T1.community_area
51 FROM crime_data AS T1 JOIN census_data AS T2
52 ON T1.community_area = T2.community_area";
53RUN;
54 
55PROC CAS;
56 TABLE.fetch /
57 TABLE="CrimeCensusBasique"
58 to=5;
59RUN;
2 Code Block
FEDSQL.EXECDIRECT
Explanation :
This example extends the basic case by filtering `CrimeCensusBasique` data for 'THEFT' crimes over a specific period. It then aggregates the results by crime type, community area, hardship index, and per capita income, counting the number of crimes. Only groups with at least 2 crimes are included, and results are sorted by the number of crimes in descending order.
Copied!
1* Les données de démonstration et la table CrimeCensusBasique sont créées dans l'exemple 1;
2 
3PROC CAS;
4 fedsql.execdirect /
5 query="SELECT primary_type, COUNT(*) AS count_crimes,
6 community_area, hardship_index, per_capita_income
7 FROM CrimeCensusBasique
8 WHERE primary_type = 'THEFT' AND date BETWEEN '2015-01-01' AND '2016-12-31'
9 GROUP BY primary_type, community_area, hardship_index, per_capita_income
10 HAVING COUNT(*) >= 2
11 ORDER BY count_crimes DESC";
12RUN;
3 Code Block
FEDSQL.EXECDIRECT
Explanation :
This code demonstrates a more advanced FedSQL query. It joins the `crime_data` and `census_data` tables, applying a filter condition on `community_area` based on a subquery: only community areas with a `hardship_index` greater than 60 are included. Additionally, a new column `IncomeCategory` is created using a `CASE` statement to classify areas based on their per capita income. `table.fetch` is used to visualize the result.
Copied!
1* Les données de démonstration sont créées dans l'exemple 1;
2 
3PROC CAS;
4 fedsql.execdirect /
5 query="CREATE TABLE CrimeCensusAvance {options replace=true} AS
6 SELECT T1.primary_type, T1.date, T1.arrest,
7 T2.hardship_index, T2.per_capita_income, T1.community_area,
8 CASE WHEN T2.per_capita_income > 35000 THEN 'High Income Area'
9 ELSE 'Low Income Area'
10 END AS IncomeCategory
11 FROM crime_data AS T1 JOIN census_data AS T2
12 ON T1.community_area = T2.community_area
13 WHERE T1.community_area IN (SELECT community_area FROM census_data WHERE hardship_index > 60)";
14RUN;
15 
16PROC CAS;
17 TABLE.fetch /
18 TABLE="CrimeCensusAvance"
19 to=5;
20RUN;
4 Code Block
FEDSQL.EXECDIRECT / TABLE.FETCH
Explanation :
This example highlights CAS in-memory processing capabilities by directly creating a summary table (`SummaryCAS`) via `FEDSQL.EXECDIRECT`. The query calculates the total number of crimes and the average income per community area. Then, the `TABLE.FETCH` action is used to retrieve and display the first 10 rows of this new table, with custom sorting, demonstrating seamless integration between FedSQL and CAS table actions for result analysis.
Copied!
1* Les données de démonstration sont créées dans l'exemple 1;
2 
3PROC CAS;
4 fedsql.execdirect /
5 query="CREATE TABLE SummaryCAS {options replace=true} AS
6 SELECT primary_type, community_area, COUNT(*) AS TotalCrimes,
7 AVG(per_capita_income) AS AvgIncome
8 FROM crime_data AS T1 JOIN census_data AS T2
9 ON T1.community_area = T2.community_area
10 GROUP BY primary_type, community_area
11 HAVING TotalCrimes >= 2
12 ORDER BY TotalCrimes DESC, AvgIncome ASC";
13RUN;
14 
15* Afficher les 10 premières lignes de la table SummaryCAS, triées;
16PROC CAS;
17 TABLE.fetch /
18 TABLE="SummaryCAS"
19 to=10
20 sortby={{name="TotalCrimes", order="descending"}, {name="AvgIncome", order="ascending"}};
21RUN;
Pro Tip
If you are joining a very small "lookup" table to a massive "fact" table, FedSQL automatically optimizes the process by broadcasting the small table to all worker nodes, minimizing network traffic.
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.