Published on :
Data Analysis CREATION_INTERNE

Obtain FedSQL Query Execution Details

This code is also available in: Deutsch Español Français
Awaiting validation
The fedsql.execdirect action allows submitting SQL queries directly to the CAS server. By specifying the 'showStages=true' parameter, it is possible to obtain a detailed report of the query's execution plan. This report breaks down execution into several stages, displaying the methods used (e.g., sorting, aggregation, sequential scan), the SQL query generated for each stage (stage query), the number of data access threads per worker, input tables with their properties (number of rows, columns, partitioning), and execution times for each stage and for the global query. This information is crucial for query optimization and understanding FedSQL's behavior on the distributed CAS architecture.
Data Analysis

Type : CREATION_INTERNE


The examples use an internally created 'crimeCensus' table with a DATA Step and dummy data to simulate the behavior described in the documentation.

1 Code Block
DATA STEP / FEDSQL.EXECDIRECT Data
Explanation :
This minimal example creates a temporary 'crimeCensus' table in CASUSER with dummy data. Then, it executes a simple FedSQL query that counts the number of occurrences for each 'primary_type' and sorts the results in descending order of the count. The 'showStages' parameter is not used here to illustrate the basic use of the action.
Copied!
1DATA casuser.crimeCensus;
2 INFILE DATALINES dlm=';';
3 INPUT primary_type $ date :yymmdd10. community_area hardship_index per_capita_income;
4 FORMAT date yymmdd10.;
5 DATALINES;
6THEFT;2015-01-15;10;80;25000
7THEFT;2015-02-20;10;80;25000
8THEFT;2015-03-01;10;80;25000
9THEFT;2015-04-10;10;80;25000
10THEFT;2015-05-05;10;80;25000
11THEFT;2016-01-20;10;80;25000
12THEFT;2016-02-25;10;80;25000
13THEFT;2016-03-05;10;80;25000
14THEFT;2016-04-15;10;80;25000
15THEFT;2016-05-10;10;80;25000
16ASSAULT;2015-01-01;20;70;30000
17ASSAULT;2015-01-02;20;70;30000
18ASSAULT;2015-01-03;20;70;30000
19ASSAULT;2015-01-04;20;70;30000
20ASSAULT;2015-01-05;20;70;30000
21ROBBERY;2016-06-01;30;90;20000
22ROBBERY;2016-06-02;30;90;20000
23ROBBERY;2016-06-03;30;90;20000
24ROBBERY;2016-06-04;30;90;20000
25ROBBERY;2016-06-05;30;90;20000
26;
27RUN;
28 
29PROC CAS;
30 fedsql.execdirect
31 query="select primary_type, count(*) as count from crimeCensus group by primary_type order by count desc";
32QUIT;
33 
2 Code Block
FEDSQL.EXECDIRECT
Explanation :
This example reproduces the use case provided in the documentation. It submits a more complex FedSQL query, filtering by crime type ('THEFT') and a date range, grouping by several variables and having a 'HAVING' condition. The 'showStages=true' parameter is enabled to obtain the execution plan details, showing the different stages of query processing on the CAS server.
Copied!
1PROC CAS;
2 fedsql.execdirect
3 showStages=true
4 query="select primary_type, count(*) as count, community_area,
5 hardship_index, per_capita_income from crimeCensus
6 where primary_type='THEFT'
7 and date between date '2015-01-01' and date'2016-01-01'
8 group by primary_type, community_area, hardship_index, per_capita_income
9 having count(*) >= 5
10 order by count desc";
11QUIT;
3 Code Block
FEDSQL.EXECDIRECT
Explanation :
This advanced example introduces a join between the 'crimeCensus' table (created in Example 1) and a new 'population_data' table. The query calculates the total number of incidents and the incident rate per 1000 inhabitants for specific crime types over a longer period. Using 'showStages=true' allows analyzing the execution plan for this multi-table query, including how the join and complex calculations are handled by FedSQL on CAS.
Copied!
1DATA casuser.population_data;
2 INFILE DATALINES dlm=';';
3 INPUT community_area $ population;
4 DATALINES;
510;50000
620;75000
730;60000
8;
9RUN;
10 
11PROC CAS;
12 fedsql.execdirect
13 showStages=true
14 query="select c.primary_type, count(c.*) as total_incidents,
15 c.community_area, p.population,
16 (count(c.*) * 1000 / p.population) as incidents_per_1000
17 from crimeCensus as c
18 join population_data as p
19 on c.community_area = p.community_area
20 where c.primary_type in ('THEFT', 'ROBBERY')
21 and c.date between date '2015-01-01' and date '2016-12-31'
22 group by c.primary_type, c.community_area, p.population
23 having count(c.*) >= 2
24 order by incidents_per_1000 desc";
25QUIT;
4 Code Block
FEDSQL.EXECDIRECT / TABLE.REPLICATE
Explanation :
This example focuses on Viya/CAS integration by replicating the 'crimeCensus' table (for demonstration purposes, the table is first recreated if necessary) in CAS server memory using the 'table.replicate' action. Replication can improve performance for small, frequently accessed tables. Then, a FedSQL query is executed on the replicated table with 'showStages=true' to observe how the execution plan leverages the replicated table (e.g., 'Replicated to one worker' in the plan output), which can show more efficient data access time.
Copied!
1PROC CAS;
2 /* Créer la table crimeCensus si elle n'existe pas déjà (dupliqué pour l'autonomie) */
3 DATA casuser.crimeCensus;
4 INFILE DATALINES dlm=';';
5 INPUT primary_type $ date :yymmdd10. community_area hardship_index per_capita_income;
6 FORMAT date yymmdd10.;
7 DATALINES;
8THEFT;2015-01-15;10;80;25000
9THEFT;2015-02-20;10;80;25000
10THEFT;2015-03-01;10;80;25000
11THEFT;2015-04-10;10;80;25000
12THEFT;2015-05-05;10;80;25000
13THEFT;2016-01-20;10;80;25000
14THEFT;2016-02-25;10;80;25000
15THEFT;2016-03-05;10;80;25000
16THEFT;2016-04-15;10;80;25000
17THEFT;2016-05-10;10;80;25000
18ASSAULT;2015-01-01;20;70;30000
19ASSAULT;2015-01-02;20;70;30000
20ASSAULT;2015-01-03;20;70;30000
21ASSAULT;2015-01-04;20;70;30000
22ASSAULT;2015-01-05;20;70;30000
23ROBBERY;2016-06-01;30;90;20000
24ROBBERY;2016-06-02;30;90;20000
25ROBBERY;2016-06-03;30;90;20000
26ROBBERY;2016-06-04;30;90;20000
27ROBBERY;2016-06-05;30;90;20000
28;
29 RUN;
30 
31 /* Répliquer la table crimeCensus dans CAS pour une meilleure performance */
32 TABLE.replicate /
33 name='crimeCensus',
34 caslib='casuser',
35 target='crimeCensus_replicated',
36 numreplicas=1; /* Répliquer sur un seul worker pour une petite table */
37 
38 /* Exécuter une requête FedSQL sur la table répliquée avec showStages */
39 fedsql.execdirect
40 showStages=true
41 query="select primary_type, count(*) as count from crimeCensus_replicated group by primary_type order by count desc";
42QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved


Banner
Expert Advice
Expert
Michael
Responsable de l'infrastructure Viya.
« Pay close attention to the replicate action in Example 4. For small lookup tables, replicating the data to every worker node allows the FedSQL engine to perform "local joins." This eliminates the need for expensive data shuffles across the network, often reducing query time from seconds to milliseconds. »