PROC SQL Reimagined: How to Run Complex Joins in SAS Viya with fedSql.execDirect

Difficulty Level

Beginner

Published on : 15/10/2020

Expert Advice

Michael
Responsable de l'infrastructure Viya.

In standard SQL, replacing a table usually involves a DROP TABLE IF EXISTS statement followed by a CREATE TABLE. In SAS Viya's FedSQL, you can streamline this using the table options syntax directly within the creation statement. As shown in your code, CREATE TABLE MyTable {options replace=true} AS SELECT... handles the overwrite logic atomically. This is cleaner and prevents errors where a drop might fail if the table doesn't exist yet.

The `fedSql.execDirect` action is a powerful tool in SAS^© Viya^™ for executing SQL queries directly on data loaded into Cloud Analytic Services (CAS). This example details the process of loading raw data, joining it using standard SQL clauses (`JOIN`, `USING`), and then applying filtering (`WHERE`), aggregation (`GROUP BY`, `HAVING`), and sorting (`ORDER BY`) to analyze the data. The ability to create new tables in CAS memory from complex queries is highlighted, as well as the interaction with other CAS table actions like `table.fetch`.

Data Analysis

Type : INTERNAL_CREATION

The examples use internally generated data (datalines) for the 'crime_data' and 'census_data' tables to make each code block self-contained and executable.

1 Code Block

DATA STEP / FEDSQL.EXECDIRECT Data

Explanation :
This code creates two temporary CAS tables, `crime_data` and `census_data`, from inline data. It then uses `fedsql.execdirect` to perform a join (`JOIN`) on the `community_area` column to combine information from both tables into a new table called `CrimeCensusBasique`. Finally, `table.fetch` is used to display the first five rows of the resulting table.

Copied!

1	* Connexion à CAS et chargement des données de démonstration;
2	options cas convenient;
3	caslib _all_ assign;
4
5	* Création de données de démonstration pour CRIME;
6	DATA mycas.crime_data (promote=yes);
7	INFILE DATALINES dsd;
8	INPUT primary_type $ date :yymmdd10. community_area arrest $;
9	FORMAT date yymmdd10.;
10	DATALINES;
11	"THEFT",2015-03-15,10,"FALSE"
12	"BATTERY",2015-06-20,10,"TRUE"
13	"THEFT",2015-04-01,15,"FALSE"
14	"ASSAULT",2016-01-10,20,"TRUE"
15	"THEFT",2015-08-01,10,"FALSE"
16	"THEFT",2015-09-05,15,"TRUE"
17	"THEFT",2015-10-10,20,"FALSE"
18	"BATTERY",2016-02-01,10,"TRUE"
19	"THEFT",2015-11-20,15,"FALSE"
20	"ASSAULT",2015-12-25,20,"TRUE"
21	"THEFT",2016-01-05,10,"FALSE"
22	"THEFT",2016-01-15,10,"FALSE"
23	"THEFT",2016-01-20,10,"FALSE"
24	"THEFT",2016-01-25,10,"FALSE"
25	"THEFT",2016-01-30,10,"FALSE"
26	"THEFT",2016-01-01,15,"FALSE"
27	"THEFT",2016-01-02,15,"FALSE"
28	"THEFT",2016-01-03,15,"FALSE"
29	"THEFT",2016-01-04,15,"FALSE"
30	"THEFT",2016-01-05,15,"FALSE"
31	;
32	RUN;
33
34	* Création de données de démonstration pour CENSUS;
35	DATA mycas.census_data (promote=yes);
36	INFILE DATALINES dsd;
37	INPUT community_area hardship_index per_capita_income;
38	DATALINES;
39	10,50,30000
40	15,70,20000
41	20,30,45000
42	25,90,15000
43	;
44	RUN;
45
46	PROC CAS;
47	fedsql.execdirect /
48	query="CREATE TABLE CrimeCensusBasique {options replace=true} AS
49	SELECT T1.primary_type, T1.date, T1.arrest,
50	T2.hardship_index, T2.per_capita_income, T1.community_area
51	FROM crime_data AS T1 JOIN census_data AS T2
52	ON T1.community_area = T2.community_area";
53	RUN;
54
55	PROC CAS;
56	TABLE.fetch /
57	TABLE="CrimeCensusBasique"
58	to=5;
59	RUN;

2 Code Block

FEDSQL.EXECDIRECT

Explanation :
This example extends the basic case by filtering `CrimeCensusBasique` data for 'THEFT' crimes over a specific period. It then aggregates the results by crime type, community area, hardship index, and per capita income, counting the number of crimes. Only groups with at least 2 crimes are included, and results are sorted by the number of crimes in descending order.

Copied!

1	* Les données de démonstration et la table CrimeCensusBasique sont créées dans l'exemple 1;
2
3	PROC CAS;
4	fedsql.execdirect /
5	query="SELECT primary_type, COUNT(*) AS count_crimes,
6	community_area, hardship_index, per_capita_income
7	FROM CrimeCensusBasique
8	WHERE primary_type = 'THEFT' AND date BETWEEN '2015-01-01' AND '2016-12-31'
9	GROUP BY primary_type, community_area, hardship_index, per_capita_income
10	HAVING COUNT(*) >= 2
11	ORDER BY count_crimes DESC";
12	RUN;

3 Code Block

FEDSQL.EXECDIRECT

Explanation :
This code demonstrates a more advanced FedSQL query. It joins the `crime_data` and `census_data` tables, applying a filter condition on `community_area` based on a subquery: only community areas with a `hardship_index` greater than 60 are included. Additionally, a new column `IncomeCategory` is created using a `CASE` statement to classify areas based on their per capita income. `table.fetch` is used to visualize the result.

Copied!

1	* Les données de démonstration sont créées dans l'exemple 1;
2
3	PROC CAS;
4	fedsql.execdirect /
5	query="CREATE TABLE CrimeCensusAvance {options replace=true} AS
6	SELECT T1.primary_type, T1.date, T1.arrest,
7	T2.hardship_index, T2.per_capita_income, T1.community_area,
8	CASE WHEN T2.per_capita_income > 35000 THEN 'High Income Area'
9	ELSE 'Low Income Area'
10	END AS IncomeCategory
11	FROM crime_data AS T1 JOIN census_data AS T2
12	ON T1.community_area = T2.community_area
13	WHERE T1.community_area IN (SELECT community_area FROM census_data WHERE hardship_index > 60)";
14	RUN;
15
16	PROC CAS;
17	TABLE.fetch /
18	TABLE="CrimeCensusAvance"
19	to=5;
20	RUN;

4 Code Block

FEDSQL.EXECDIRECT / TABLE.FETCH

Explanation :
This example highlights CAS in-memory processing capabilities by directly creating a summary table (`SummaryCAS`) via `FEDSQL.EXECDIRECT`. The query calculates the total number of crimes and the average income per community area. Then, the `TABLE.FETCH` action is used to retrieve and display the first 10 rows of this new table, with custom sorting, demonstrating seamless integration between FedSQL and CAS table actions for result analysis.

Copied!

1	* Les données de démonstration sont créées dans l'exemple 1;
2
3	PROC CAS;
4	fedsql.execdirect /
5	query="CREATE TABLE SummaryCAS {options replace=true} AS
6	SELECT primary_type, community_area, COUNT(*) AS TotalCrimes,
7	AVG(per_capita_income) AS AvgIncome
8	FROM crime_data AS T1 JOIN census_data AS T2
9	ON T1.community_area = T2.community_area
10	GROUP BY primary_type, community_area
11	HAVING TotalCrimes >= 2
12	ORDER BY TotalCrimes DESC, AvgIncome ASC";
13	RUN;
14
15	* Afficher les 10 premières lignes de la table SummaryCAS, triées;
16	PROC CAS;
17	TABLE.fetch /
18	TABLE="SummaryCAS"
19	to=10
20	sortby={{name="TotalCrimes", order="descending"}, {name="AvgIncome", order="ascending"}};
21	RUN;

Pro Tip

If you are joining a very small "lookup" table to a massive "fact" table, FedSQL automatically optimizes the process by broadcasting the small table to all worker nodes, minimizing network traffic.

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Difficulty Level

Published on : 15/10/2020

Expert Advice

Data Analysis

1 Code Block

2 Code Block

3 Code Block

4 Code Block

Pro Tip

Related Documentation