Obtain FedSQL Query Execution Details

The fedsql.execdirect action allows submitting SQL queries directly to the CAS server. By specifying the 'showStages=true' parameter, it is possible to obtain a detailed report of the query's execution plan. This report breaks down execution into several stages, displaying the methods used (e.g., sorting, aggregation, sequential scan), the SQL query generated for each stage (stage query), the number of data access threads per worker, input tables with their properties (number of rows, columns, partitioning), and execution times for each stage and for the global query. This information is crucial for query optimization and understanding FedSQL's behavior on the distributed CAS architecture.

Data Analysis

Type : CREATION_INTERNE

The examples use an internally created 'crimeCensus' table with a DATA Step and dummy data to simulate the behavior described in the documentation.

1 Code Block

DATA STEP / FEDSQL.EXECDIRECT Data

Explanation :
This minimal example creates a temporary 'crimeCensus' table in CASUSER with dummy data. Then, it executes a simple FedSQL query that counts the number of occurrences for each 'primary_type' and sorts the results in descending order of the count. The 'showStages' parameter is not used here to illustrate the basic use of the action.

Copied!

1	DATA casuser.crimeCensus;
2	INFILE DATALINES dlm=';';
3	INPUT primary_type $ date :yymmdd10. community_area hardship_index per_capita_income;
4	FORMAT date yymmdd10.;
5	DATALINES;
6	THEFT;2015-01-15;10;80;25000
7	THEFT;2015-02-20;10;80;25000
8	THEFT;2015-03-01;10;80;25000
9	THEFT;2015-04-10;10;80;25000
10	THEFT;2015-05-05;10;80;25000
11	THEFT;2016-01-20;10;80;25000
12	THEFT;2016-02-25;10;80;25000
13	THEFT;2016-03-05;10;80;25000
14	THEFT;2016-04-15;10;80;25000
15	THEFT;2016-05-10;10;80;25000
16	ASSAULT;2015-01-01;20;70;30000
17	ASSAULT;2015-01-02;20;70;30000
18	ASSAULT;2015-01-03;20;70;30000
19	ASSAULT;2015-01-04;20;70;30000
20	ASSAULT;2015-01-05;20;70;30000
21	ROBBERY;2016-06-01;30;90;20000
22	ROBBERY;2016-06-02;30;90;20000
23	ROBBERY;2016-06-03;30;90;20000
24	ROBBERY;2016-06-04;30;90;20000
25	ROBBERY;2016-06-05;30;90;20000
26	;
27	RUN;
28
29	PROC CAS;
30	fedsql.execdirect
31	query="select primary_type, count(*) as count from crimeCensus group by primary_type order by count desc";
32	QUIT;
33

2 Code Block

FEDSQL.EXECDIRECT

Explanation :
This example reproduces the use case provided in the documentation. It submits a more complex FedSQL query, filtering by crime type ('THEFT') and a date range, grouping by several variables and having a 'HAVING' condition. The 'showStages=true' parameter is enabled to obtain the execution plan details, showing the different stages of query processing on the CAS server.

Copied!

1	PROC CAS;
2	fedsql.execdirect
3	showStages=true
4	query="select primary_type, count(*) as count, community_area,
5	hardship_index, per_capita_income from crimeCensus
6	where primary_type='THEFT'
7	and date between date '2015-01-01' and date'2016-01-01'
8	group by primary_type, community_area, hardship_index, per_capita_income
9	having count(*) >= 5
10	order by count desc";
11	QUIT;

3 Code Block

FEDSQL.EXECDIRECT

Explanation :
This advanced example introduces a join between the 'crimeCensus' table (created in Example 1) and a new 'population_data' table. The query calculates the total number of incidents and the incident rate per 1000 inhabitants for specific crime types over a longer period. Using 'showStages=true' allows analyzing the execution plan for this multi-table query, including how the join and complex calculations are handled by FedSQL on CAS.

Copied!

1	DATA casuser.population_data;
2	INFILE DATALINES dlm=';';
3	INPUT community_area $ population;
4	DATALINES;
5	10;50000
6	20;75000
7	30;60000
8	;
9	RUN;
10
11	PROC CAS;
12	fedsql.execdirect
13	showStages=true
14	query="select c.primary_type, count(c.*) as total_incidents,
15	c.community_area, p.population,
16	(count(c.) 1000 / p.population) as incidents_per_1000
17	from crimeCensus as c
18	join population_data as p
19	on c.community_area = p.community_area
20	where c.primary_type in ('THEFT', 'ROBBERY')
21	and c.date between date '2015-01-01' and date '2016-12-31'
22	group by c.primary_type, c.community_area, p.population
23	having count(c.*) >= 2
24	order by incidents_per_1000 desc";
25	QUIT;

4 Code Block

FEDSQL.EXECDIRECT / TABLE.REPLICATE

Explanation :
This example focuses on Viya/CAS integration by replicating the 'crimeCensus' table (for demonstration purposes, the table is first recreated if necessary) in CAS server memory using the 'table.replicate' action. Replication can improve performance for small, frequently accessed tables. Then, a FedSQL query is executed on the replicated table with 'showStages=true' to observe how the execution plan leverages the replicated table (e.g., 'Replicated to one worker' in the plan output), which can show more efficient data access time.

Copied!

1	PROC CAS;
2	/* Créer la table crimeCensus si elle n'existe pas déjà (dupliqué pour l'autonomie) */
3	DATA casuser.crimeCensus;
4	INFILE DATALINES dlm=';';
5	INPUT primary_type $ date :yymmdd10. community_area hardship_index per_capita_income;
6	FORMAT date yymmdd10.;
7	DATALINES;
8	THEFT;2015-01-15;10;80;25000
9	THEFT;2015-02-20;10;80;25000
10	THEFT;2015-03-01;10;80;25000
11	THEFT;2015-04-10;10;80;25000
12	THEFT;2015-05-05;10;80;25000
13	THEFT;2016-01-20;10;80;25000
14	THEFT;2016-02-25;10;80;25000
15	THEFT;2016-03-05;10;80;25000
16	THEFT;2016-04-15;10;80;25000
17	THEFT;2016-05-10;10;80;25000
18	ASSAULT;2015-01-01;20;70;30000
19	ASSAULT;2015-01-02;20;70;30000
20	ASSAULT;2015-01-03;20;70;30000
21	ASSAULT;2015-01-04;20;70;30000
22	ASSAULT;2015-01-05;20;70;30000
23	ROBBERY;2016-06-01;30;90;20000
24	ROBBERY;2016-06-02;30;90;20000
25	ROBBERY;2016-06-03;30;90;20000
26	ROBBERY;2016-06-04;30;90;20000
27	ROBBERY;2016-06-05;30;90;20000
28	;
29	RUN;
30
31	/* Répliquer la table crimeCensus dans CAS pour une meilleure performance */
32	TABLE.replicate /
33	name='crimeCensus',
34	caslib='casuser',
35	target='crimeCensus_replicated',
36	numreplicas=1; /* Répliquer sur un seul worker pour une petite table */
37
38	/* Exécuter une requête FedSQL sur la table répliquée avec showStages */
39	fedsql.execdirect
40	showStages=true
41	query="select primary_type, count(*) as count from crimeCensus_replicated group by primary_type order by count desc";
42	QUIT;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste

Expert Advice

Michael

Responsable de l'infrastructure Viya.

« Pay close attention to the replicate action in Example 4. For small lookup tables, replicating the data to every worker node allows the FedSQL engine to perform "local joins." This eliminates the need for expensive data shuffles across the network, often reducing query time from seconds to milliseconds. »