ETL SAS VIYA

SAS 9 vs. CAS: Why Your BY Statement Is Breaking Your Sort Order

This code is also available in: Deutsch Español
Difficulty Level
Beginner
Published on :
Michael

Expert Advice

Michael
Responsable de l'infrastructure Viya.

Migrating SAS code to Viya? Discover why the BY statement behaves differently in Cloud Analytic Services (CAS) due to distributed partitioning, and learn how to force stable row ordering

When combining SAS© tables, the order of rows in a CAS table is not necessarily the order in which they are processed. Unlike SAS© 9, which guarantees input row order, CAS tables do not offer this guarantee because they are distributed across multiple grid nodes. When partitioning occurs, BY groups are formed on a single node from rows originating from multiple workers.
The order of rows within a partition is not stable. To achieve stable sorting, it is recommended to use more BY variables. For example, with 'BY ID;', rows are grouped by the formatted value of ID, then sorted by the raw value of ID. The initial row order is not considered. With 'BY ID DATE;', rows are grouped by the formatted value of ID, then sorted by the raw values of ID and DATE.
The BY statement combines rows from two datasets that have the same values for the BY variable. This functionality is crucial for complex data manipulation operations and for ensuring consistent results in distributed environments like CAS.
Data Analysis

Type : INTERNAL_CREATION


Examples use generated data (datalines) or SASHELP.

1 Code Block
DATA STEP Data
Explanation :
This example demonstrates the concatenation of two tables (animal1 and plant1) using the SET statement without the BY statement. Rows from 'plant1' are simply appended to the end of 'animal1'. This is a basic operation for combining data.
Copied!
1DATA animal1;
2 INPUT Common $ Animal $ 3-8 Number 10-11;
3 DATALINES;
4a Ant 1
5b Bird 2
6c Cat 3
7d Dog 4
8e Eagle 5
9f Frog 6
10g Goose 7
11h Hawk 8
12i Impala 9
13;
14 
15DATA plant1;
16 INPUT Common $ Plant $ 3-10 Number 12-13;
17 DATALINES;
18a Grape 1
19c Hazelnut 2
20e Indigo 3
21g Jicama 4
22i Kale 5
23;
24 
25/* Concaténer animal1 et plant1 */
26DATA append_basique;
27 SET animal1 plant1;
28RUN;
29 
30PROC PRINT DATA=append_basique;
31 title '1. Concaténation de tables Animal et Plant (Base SAS)';
32RUN;
2 Code Block
DATA STEP Data
Explanation :
This example uses the BY statement with a single variable ('Common') to interleave records from the 'animal1' and 'plant1' tables. Rows are combined based on matching values of the 'Common' variable.
Copied!
1DATA animal1;
2 INPUT Common $ Animal $ 3-8 Number 10-11;
3 DATALINES;
4a Ant 1
5b Bird 2
6c Cat 3
7d Dog 4
8e Eagle 5
9f Frog 6
10g Goose 7
11h Hawk 8
12i Impala 9
13;
14 
15DATA plant1;
16 INPUT Common $ Plant $ 3-10 Number 12-13;
17 DATALINES;
18a Grape 1
19c Hazelnut 2
20e Indigo 3
21g Jicama 4
22i Kale 5
23;
24 
25/* Entrelacer animal1 et plant1 par la variable COMMON */
26DATA interleave_by_common;
27 SET animal1 plant1;
28 BY common;
29RUN;
30 
31PROC PRINT DATA=interleave_by_common;
32 title '2. Entrelacement par la variable COMMON (Base SAS)';
33RUN;
3 Code Block
DATA STEP Data
Explanation :
This example illustrates more advanced interleaving using the BY statement with two variables ('Common' and 'Number'). This allows for more granular control over the order of combining rows between the 'animal1' and 'plant1' tables.
Copied!
1DATA animal1;
2 INPUT Common $ Animal $ 3-8 Number 10-11;
3 DATALINES;
4a Ant 1
5b Bird 2
6c Cat 3
7d Dog 4
8e Eagle 5
9f Frog 6
10g Goose 7
11h Hawk 8
12i Impala 9
13;
14 
15DATA plant1;
16 INPUT Common $ Plant $ 3-10 Number 12-13;
17 DATALINES;
18a Grape 1
19c Hazelnut 2
20e Indigo 3
21g Jicama 4
22i Kale 5
23;
24 
25/* Entrelacer animal1 et plant1 par COMMON et Number */
26DATA interleave_by_common_number;
27 SET animal1 plant1;
28 BY common Number;
29RUN;
30 
31PROC PRINT DATA=interleave_by_common_number;
32 title '3. Entrelacement par COMMON et Number (Base SAS)';
33RUN;
4 Code Block
DATA STEP / PROC CASUTIL Data
Explanation :
This example demonstrates how to perform table interleaving in the CAS environment. It starts by creating base SAS tables, loads them as CAS tables using PROC CASUTIL, then uses a DATA step with the BY statement to interleave the tables directly in distributed memory. The processing order in CAS may differ from Base SAS due to parallelism and partitioning. The `_hostname_` variable is added to observe on which node the data was processed, illustrating the distributed nature of CAS.
Copied!
1LIBNAME mycas cas;
2 
3DATA animal_cas;
4 INPUT Common $ Animal $ 3-8 Number 10-11;
5 DATALINES;
6a Ant 1
7b Bird 2
8c Cat 3
9d Dog 4
10e Eagle 5
11f Frog 6
12g Goose 7
13h Hawk 8
14i Impala 9
15;
16 
17DATA plant_cas;
18 INPUT Common $ Plant $ 3-10 Number 12-13;
19 DATALINES;
20a Grape 1
21c Hazelnut 2
22e Indigo 3
23g Jicama 4
24i Kale 5
25;
26 
27PROC CASUTIL;
28 load DATA=animal_cas outcaslib='CASUSER' casout='ANIMAL_CAS' replace;
29 load DATA=plant_cas outcaslib='CASUSER' casout='PLANT_CAS' replace;
30RUN;
31 
32/* Entrelacer les tables CAS par la variable COMMON */
33DATA mycas.interleaveByCommonCas;
34 SET mycas.animal_cas mycas.plant_cas;
35 BY common;
36 /* La variable _hostname_ peut être utile pour observer la distribution */
37 hostname=_hostname_;
38RUN;
39 
40PROC PRINT DATA=mycas.interleaveByCommonCas;
41 title '4. Entrelacement par COMMON dans un DATA Step en CAS';
42RUN;
43 
44/* Nettoyage des tables CAS */
45PROC CASUTIL;
46 droptable casdata='ANIMAL_CAS' incaslib='CASUSER';
47 droptable casdata='PLANT_CAS' incaslib='CASUSER';
48 droptable casdata='INTERLEAVEBYCOMMONCAS' incaslib='CASUSER';
49RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.