Published on :

CARDINALITY Procedure: Variable with User-Defined Format

This code is also available in: Deutsch Español Français
Awaiting validation
The CARDINALITY procedure analyzes a variable in a CAS dataset to determine its number of unique levels (cardinality). When user-defined formats are applied to a variable, the procedure treats these formats as prior knowledge and uses them by default to order the levels. This example focuses on the engineSize variable from the SASHELP.CARS dataset, to which an engsize format is applied. The output includes the cardinality of the formatted variable and a detailed table of frequencies for each formatted level.
Data Analysis

Type : SASHELP


The examples use the built-in SASHELP.CARS dataset, which is loaded into a CAS session, and internally created data (datalines) for format demonstrations.

1 Code Block
PROC CARDINALITY Data
Explanation :
This basic example shows how to define a user-defined format for the 'engineSize' variable and how the CARDINALITY procedure uses it by default to calculate cardinality. It loads the 'sashelp.cars' dataset into a CAS session, creates an 'engsize' format, applies it to 'engineSize', and then executes PROC CARDINALITY. The results (_VARNAME_, _ORDER_, _MORE_, _CARDINALITY_, _INDEX_, _FREQ_, _CFMT_) are then displayed to confirm that the format has been taken into account and that the levels are ordered according to the format labels.
Copied!
1/* 1. Charger le jeu de données SASHELP.CARS en session CAS */
2PROC CAS;
3 SESSION casauto;
4 caslib _all_ assign;
5 DATA casuser.cars;
6 SET sashelp.cars;
7 RUN;
8QUIT;
9 
10/* 2. Créer un format défini par l'utilisateur pour engineSize */
11PROC FORMAT casfmtlib='casuser.myfmtlib';
12 value engsize
13 low - <3 = 'Petit moteur'
14 3 - <6 = 'Moteur moyen'
15 6 - high= 'Gros moteur';
16RUN;
17 
18/* 3. Appliquer le format à la variable engineSize et charger en CAS */
19DATA casuser.cars;
20 FORMAT engineSize engsize.;
21 SET casuser.cars;
22RUN;
23 
24/* 4. Exécuter PROC CARDINALITY avec la variable formatée */
25PROC CARDINALITY DATA=casuser.cars outcard=casuser.card outdetails=casuser.details;
26 var engineSize;
27RUN;
28 
29/* 5. Afficher le résumé de cardinalité */
30title 'Cardinalité de engineSize avec format défini par l''utilisateur (Basique)';
31PROC PRINT DATA=casuser.card;
32 var _VARNAME_ _ORDER_ _MORE_ _CARDINALITY_;
33RUN;
34 
35/* 6. Afficher les détails des niveaux formatés */
36title 'Détails des niveaux formatés pour engineSize (Basique)';
37PROC PRINT DATA=casuser.details;
38 var _VARNAME_ _INDEX_ _FREQ_ _CFMT_;
39RUN;
2 Code Block
PROC CARDINALITY Data
Explanation :
This intermediate example uses a user-defined format that includes missing value handling. It applies the format, intentionally introduces missing values into the 'engineSize' variable, and then executes PROC CARDINALITY with the `ORDER=FORMATTED` option to ensure that level ordering is explicitly based on the format labels, including the level for missing values.
Copied!
1/* 1. Charger le jeu de données SASHELP.CARS en session CAS */
2PROC CAS;
3 SESSION casauto;
4 caslib _all_ assign;
5 DATA casuser.cars;
6 SET sashelp.cars;
7 RUN;
8QUIT;
9 
10/* 2. Créer un format défini par l'utilisateur pour engineSize avec une valeur manquante */
11PROC FORMAT casfmtlib='casuser.myfmtlib';
12 value engsize_plus
13 low - <3 = 'Type A'
14 3 - <6 = 'Type B'
15 6 - high= 'Type C'
16 . = 'Non spécifié'; /* Gestion des valeurs manquantes */
17RUN;
18 
19/* 3. Appliquer le format et introduire des valeurs manquantes */
20DATA casuser.cars_missing;
21 SET casuser.cars;
22 IF mod(_n_, 10) = 0 THEN call missing(engineSize); /* Introduire des manquants */
23 FORMAT engineSize engsize_plus.;
24RUN;
25 
26/* 4. Exécuter PROC CARDINALITY avec ORDER=FORMATTED */
27PROC CARDINALITY DATA=casuser.cars_missing outcard=casuser.card_ord outdetails=casuser.details_ord;
28 var engineSize;
29 order formatted;
30RUN;
31 
32/* 5. Afficher le résumé de cardinalité */
33title 'Cardinalité de engineSize avec format défini par l''utilisateur et manquants (Ordonné par format)';
34PROC PRINT DATA=casuser.card_ord;
35 var _VARNAME_ _ORDER_ _MORE_ _CARDINALITY_;
36RUN;
37 
38/* 6. Afficher les détails des niveaux formatés */
39title 'Détails des niveaux formatés pour engineSize avec manquants (Ordonné par format)';
40PROC PRINT DATA=casuser.details_ord;
41 var _VARNAME_ _INDEX_ _FREQ_ _CFMT_;
42RUN;
3 Code Block
PROC CARDINALITY Data
Explanation :
This advanced example combines the use of user-defined formats with data filtering and the analysis of multiple variables. It creates formats for 'driveTrain' and 'fueltype', applies a complex `WHERE` filter to select a subset of vehicles, and then executes PROC CARDINALITY on these variables. The `ORDER=FREQ` option is used to order levels by decreasing frequency, and `MAXLEVELS` is adjusted to obtain more detailed level information.
Copied!
1/* 1. Charger le jeu de données SASHELP.CARS en session CAS */
2PROC CAS;
3 SESSION casauto;
4 caslib _all_ assign;
5 DATA casuser.cars;
6 SET sashelp.cars;
7 RUN;
8QUIT;
9 
10/* 2. Créer un format défini par l'utilisateur complexe */
11PROC FORMAT casfmtlib='casuser.myfmtlib';
12 value $fueltype
13 'Premium' = 'Carburant Premium'
14 'Regular' = 'Carburant Régulier'
15 other = 'Autre Carburant';
16 
17 value $drivetrain
18 'Front' = 'Traction avant'
19 'Rear' = 'Propulsion'
20 'All' = 'Quatre roues motrices'
21 other = 'Inconnu';
22RUN;
23 
24/* 3. Appliquer les formats et filtrer les données */
25DATA casuser.cars_filtered;
26 SET casuser.cars;
27 FORMAT driveTrain $drivetrain. fueltype $fueltype.;
28 where msrp < 30000 and make in ('Audi', 'BMW', 'Mercedes-Benz'); /* Filtrage avancé */
29RUN;
30 
31/* 4. Exécuter PROC CARDINALITY sur plusieurs variables formatées */
32PROC CARDINALITY DATA=casuser.cars_filtered outcard=casuser.card_adv outdetails=casuser.details_adv maxlevels=10;
33 var driveTrain fueltype;
34 order freq; /* Ordonner par fréquence */
35RUN;
36 
37/* 5. Afficher les résultats */
38title 'Cardinalité des variables formatées (Cas Avancé avec filtrage)';
39PROC PRINT DATA=casuser.card_adv;
40 var _VARNAME_ _ORDER_ _MORE_ _CARDINALITY_;
41RUN;
42 
43title 'Détails des niveaux formatés (Cas Avancé avec filtrage)';
44PROC PRINT DATA=casuser.details_adv;
45 var _VARNAME_ _INDEX_ _FREQ_ _CFMT_;
46RUN;
4 Code Block
PROC CARDINALITY Data
Explanation :
This Viya/CAS scenario is optimized for large tables. It simulates a large data table directly in a CAS session, applies user-defined formats to numeric and character variables, and then executes PROC CARDINALITY. The `NTHREADS` option is specifically used to leverage CAS's parallel processing capabilities, which is crucial for performance on large datasets. The `ESTIMATE METHOD=EXACT` option ensures accurate cardinality calculation, even with large data. After analysis, temporary tables are dropped.
Copied!
1/* 1. Créer une grande table CAS pour la démonstration */
2PROC CAS;
3 SESSION casauto;
4 caslib _all_ assign;
5 
6 DATA casuser.large_data;
7 DO i = 1 to 100000; /* 100 000 observations */
8 engineSize_raw = ranuni(12345) * 10; /* Générer des tailles moteur continues */
9 IF engineSize_raw < 3 THEN engineSize = 1; /* Simuler des groupes pour le format */
10 ELSE IF engineSize_raw < 6 THEN engineSize = 2;
11 ELSE engineSize = 3;
12 
13 fueltype_raw = rand('UNIFORM') * 3;
14 IF fueltype_raw < 1 THEN fueltype_char = 'Premium';
15 ELSE IF fueltype_raw < 2 THEN fueltype_char = 'Regular';
16 ELSE fueltype_char = 'Diesel';
17 
18 OUTPUT;
19 END;
20 RUN;
21QUIT;
22 
23/* 2. Créer les formats définis par l'utilisateur */
24PROC FORMAT casfmtlib='casuser.myfmtlib';
25 value large_engsize
26 1 = 'Petite Moteur (CAS)'
27 2 = 'Moteur Moyen (CAS)'
28 3 = 'Gros Moteur (CAS)';
29 value $large_fueltype
30 'Premium' = 'Carburant Premium (CAS)'
31 'Regular' = 'Carburant Régulier (CAS)'
32 'Diesel' = 'Carburant Diesel (CAS)';
33RUN;
34 
35/* 3. Appliquer les formats à la grande table CAS */
36DATA casuser.large_data;
37 SET casuser.large_data;
38 FORMAT engineSize large_engsize. fueltype_char $large_fueltype.;
39RUN;
40 
41/* 4. Exécuter PROC CARDINALITY en parallèle sur la grande table */
42PROC CARDINALITY DATA=casuser.large_data outcard=casuser.card_large outdetails=casuser.details_large;
43 var engineSize fueltype_char;
44 nthreads=4; /* Utilisation de threads pour une exécution parallèle */
45 estimate method=exact; /* Assurer un calcul exact de cardinalité */
46RUN;
47 
48/* 5. Afficher les résultats pour la table volumineuse */
49title 'Cardinalité des variables formatées sur une table volumineuse (CAS/Parallèle)';
50PROC PRINT DATA=casuser.card_large;
51 var _VARNAME_ _ORDER_ _MORE_ _CARDINALITY_;
52RUN;
53 
54title 'Détails des niveaux formatés sur une table volumineuse (CAS/Parallèle)';
55PROC PRINT DATA=casuser.details_large;
56 var _VARNAME_ _INDEX_ _FREQ_ _CFMT_;
57RUN;
58 
59/* 6. Supprimer la table temporaire */
60PROC CAS;
61 SESSION casauto;
62 TABLE.dropTable(name='large_data', caslib='casuser');
63 TABLE.dropTable(name='card_large', caslib='casuser');
64 TABLE.dropTable(name='details_large', caslib='casuser');
65QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.