Data Manipulation SAS VIYA CAS

Mastering CAS Logic: How to Correctly Group Data in Descending Order Without Leaving CAS

This code is also available in: Deutsch Español
Difficulty Level
Beginner
Published on :
Stéphanie

Expert Advice

Stéphanie
Spécialiste Machine Learning et IA.

One of the most powerful—and often overlooked—features of the CAS DATA Step is how it handles memory. Unlike traditional SAS 9, which requires a PROC SORT before any BY statement, CAS can often perform implicit ordering for ascending BY groups because it is already managing data in distributed partitions.

However, because CAS is designed for massive parallelism, it lacks a native "look-back" mechanism for descending order within a thread. If you absolutely need a descending sequence, always pre-sort using PROC SORT with the DESCENDING option on the CAS table first. This creates a physically ordered set that the DATA Step can then ingest without triggering a "Local Session" fallback, keeping your big data processing in the fast lane.

The DESCENDING option in the DATA Step BY statement is not directly supported when executing a DATA Step in CAS. If specified in a CAS DATA Step, execution automatically switches to a local SAS© session. This ensures that the code runs without error, but without the performance benefits of CAS's in-memory distributed processing. For correct descending order grouping in CAS, it is necessary to pre-sort the data (PROC SORT with DESCENDING) before using the CAS DATA Step. Furthermore, for simple ascending order, CAS DATA Steps do not require prior sorting due to their implicit multi-threaded ordering capability.
Data Analysis

Type : INTERNAL_CREATION


Examples use SASHELP data (sashelp.class) or internally generated data (datalines).

1 Code Block
PROC SORT / DATA STEP Data
Explanation :
This example demonstrates the classic use of the BY statement with the DESCENDING option in a local SAS session. It is imperative to pre-sort the data with PROC SORT, specifying DESCENDING for the BY variable (here 'Age'). The DATA Step can then use 'BY DESCENDING Age' to process the groups in the specified order.
Copied!
1/* Création d'une table de démonstration */
2DATA classData;
3 INPUT Name $ Age Height Weight;
4 DATALINES;
5John 14 69 118
6Mary 13 65 112
7Robert 12 64 128
8Alice 14 62 102
9Thomas 12 57 85
10;
11RUN;
12 
13/* Tri par 'Age' en ordre décroissant en SAS local */
14PROC SORT DATA=classData OUT=classAgeDescLocal;
15 BY DESCENDING Age;
16RUN;
17 
18/* DATA Step pour regrouper par 'Age' en ordre décroissant */
19DATA classAgeOrderLocal;
20 SET classAgeDescLocal;
21 BY DESCENDING Age;
22 /* Logique de traitement ici */
23 PUT 'Traitement de Age = ' Age;
24RUN;
25 
26PROC PRINT DATA=classAgeOrderLocal;
27 TITLE 'Exemple 1: Ordre Décroissant en SAS Local';
28RUN;
2 Code Block
DATA STEP CAS Data
Explanation :
This example illustrates the use of the BY statement in a DATA Step running entirely in CAS. For ascending order sorting, CAS does not require pre-sorting the data. Implicit ordering is handled by CAS itself, optimized for the distributed environment. The SAS log will confirm execution in 'Cloud Analytic Services'.
Copied!
1/* Établir une connexion à CAS et charger les données */
2LIBNAME mycas CAS;
3 
4DATA mycas.classAge_cas;
5 SET SASHELP.CLASS;
6RUN;
7 
8/* DATA Step en CAS avec regroupement implicite par 'Age' (ordre croissant) */
9DATA mycas.classAgeOrder_cas;
10 SET mycas.classAge_cas;
11 BY Age;
12 /* Logique de traitement ici, ex: calculer la moyenne par Age */
13 IF FIRST.Age THEN DO;
14 count_age = 0;
15 sum_weight = 0;
16 END;
17 count_age + 1;
18 sum_weight + Weight;
19 IF LAST.Age THEN DO;
20 mean_weight_age = sum_weight / count_age;
21 OUTPUT;
22 END;
23 KEEP Age mean_weight_age;
24RUN;
25 
26PROC PRINT DATA=mycas.classAgeOrder_cas;
27 TITLE 'Exemple 2: Regroupement implicite par Age en CAS (ordre croissant)';
28RUN;
3 Code Block
DATA STEP CAS (forced local) Data
Explanation :
This example highlights the behavior of the DATA Step when the DESCENDING option is used with BY on a CAS table. Although the DATA Step targets a CAS table, since the DESCENDING option is not natively supported in the CAS DATA Step, execution is automatically transferred to a local SAS session. The SAS log will not mention 'Running DATA step in Cloud Analytic Services' for this block.
Copied!
1/* Création d'une table de démonstration en CAS */
2LIBNAME mycas CAS;
3 
4DATA mycas.students_cas;
5 INPUT Name $ Score;
6 DATALINES;
7Alice 85
8Bob 92
9Charlie 78
10Alice 90
11Bob 88
12;
13RUN;
14 
15/* Tentative de DATA Step en CAS avec 'BY DESCENDING Score' */
16/* Cela forcera l'exécution en session SAS locale */
17DATA mycas.students_ordered_forced_local;
18 SET mycas.students_cas;
19 BY DESCENDING Score;
20 /* Logique de traitement */
21 PUT 'Traitement du score ' Score ' pour ' Name;
22RUN;
23 
24PROC PRINT DATA=mycas.students_ordered_forced_local;
25 TITLE 'Exemple 3: Exécution forcée en SAS local (BY DESCENDING en DATA Step CAS)';
26RUN;
4 Code Block
PROC SORT / DATA STEP CAS Data
Explanation :
For complex sorting scenarios, especially with descending orders or multiple BY variables, it is recommended to pre-sort the table in CAS using PROC SORT. Then, the CAS DATA Step can process the groups using the BY statement without the DESCENDING option, as the order has already been established. This allows benefiting from CAS's parallel execution while respecting the desired ordering. Here, sorting is performed by 'Group' then 'Age' in descending order.
Copied!
1/* Création d'une table de démonstration plus complexe */
2DATA classDataExtended;
3 INPUT Group $ Name $ Age Height Weight;
4 DATALINES;
5A John 14 69 118
6A Mary 13 65 112
7B Robert 12 64 128
8A Alice 14 62 102
9B Thomas 12 57 85
10A Peter 13 68 120
11B Susan 14 66 115
12;
13RUN;
14 
15/* Charger la table dans CAS */
16LIBNAME mycas CAS;
17DATA mycas.classDataExtended_cas;
18 SET classDataExtended;
19RUN;
20 
21/* Prétrier la table CAS pour un regroupement multi-variable avec ordre décroissant */
22/* Noter que PROC SORT *peut* s'exécuter en CAS si la source est une table CAS */
23PROC SORT DATA=mycas.classDataExtended_cas OUT=mycas.sortedClassExtended_cas;
24 BY Group DESCENDING Age;
25RUN;
26 
27/* DATA Step en CAS avec regroupement par les variables triées */
28DATA mycas.finalGroupedData_cas;
29 SET mycas.sortedClassExtended_cas;
30 BY Group Age;
31 /* Logique de traitement, ex: trouver le plus jeune de chaque groupe par Age décroissant */
32 IF FIRST.Group THEN YOUNGEST_IN_GROUP = Age;
33 IF FIRST.Age THEN PUT 'Nouveau groupe: ' Group ' - Age ' Age;
34 /* Peut ajouter d'autres traitements ici */
35RUN;
36 
37PROC PRINT DATA=mycas.finalGroupedData_cas;
38 TITLE 'Exemple 4: Tri et Regroupement Multi-variables en CAS (Age décroissant)';
39RUN;
Pro Tip
When working in SAS Viya, it is vital to understand that the CAS engine handles data distribution differently than a traditional SAS session. While the DATA step in CAS is highly efficient for parallel processing, it has specific limitations regarding the BY statement.

To maintain high performance and avoid unintended local processing, follow these expert rules:

Avoid DESCENDING in CAS DATA Steps: The DESCENDING option in a BY statement is not currently supported for native execution in CAS. Using it will trigger "Step Regression," where SAS silently moves all data from the distributed CAS workers to the local Compute Server to execute the step. This can cause severe performance degradation with large datasets.

Leverage CAS Implicit Sorting: For standard ascending orders, you do not need to run PROC SORT before a CAS DATA step. The CAS engine automatically groups and orders data across threads when it encounters a BY statement, saving you an entire processing step.

The Pre-Sort Workaround: If your logic strictly requires a descending order (e.g., finding the highest score or the most recent date using FIRST.variable), you must use PROC SORT first. When the input to PROC SORT is a CAS table, the sort itself is performed in-memory across the CAS nodes. Once sorted, use a simple BY statement (without the DESCENDING keyword) in your following DATA step.

Monitor the Log: Always check your SAS log for notes indicating where the DATA step ran. If you see "The DATA step will run in the SAS client," your CAS code has regressed to local mode, usually due to an unsupported option like DESCENDING.
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved.


Related Documentation

Aucune documentation spécifique pour cette catégorie.