The Ultimate Guide to Model Selection: How to Find the Perfect K-Components in SAS

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
Michael

Expert Advice

Michael
Responsable de l'infrastructure Viya.

When using the EQUATE=SCALE option in PROC HPFMM, you are assuming a common variance across all clusters, which can prevent the algorithm from collapsing on a single observation (a common issue known as the 'singularity' problem in mixture models). If your model fails to converge with unequal variances, forcing equal scales as shown in step 2 of your script is often the most robust way to achieve a stable and interpretable solution.

The script begins by creating a 'galaxies' dataset containing the velocities of several galaxies. Then, it applies the HPFMM procedure in three steps: 1) Search for the optimal number of components (from 3 to 7) with unequal variances, based on the AIC criterion. 2) Same search but forcing equal variances between components. 3) Adjustment of a final 5-component model with a constraint on the common variance value.
Data Analysis

Type : CREATION_INTERNE


The data is created directly in the script via a DATA step and a DATALINES statement. The 'velocity' variable is read and transformed into a new variable 'v' for analysis.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block reads the galaxy velocity data provided via 'datalines'. The '@@' operator (double trailing at) allows reading multiple observations on the same data line. A new variable 'v' is calculated by dividing 'velocity' by 1000 for scaling.
Copied!
1title "HPFMM Analysis of Galaxies Data";
2DATA galaxies;
3 INPUT velocity @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
4 v = velocity / 1000;
5 DATALINES;
69172 9350 9483 9558 9775 10227 10406 16084 16170 18419
718552 18600 18927 19052 19070 19330 19343 19349 19440 19473
819529 19541 19547 19663 19846 19856 19863 19914 19918 19973
919989 20166 20175 20179 20196 20215 20221 20415 20629 20795
1020821 20846 20875 20986 21137 21492 21701 21814 21921 21960
1122185 22209 22242 22249 22314 22374 22495 22746 22747 22888
1222914 23206 23241 23263 23484 23538 23542 23666 23706 23711
1324129 24285 24289 24366 24717 24990 25633 26960 26995 32065
1432789 34279
15;
16 
2 Code Block
PROC HPFMM
Explanation :
First analysis with HPFMM to determine the optimal number of components (between 3 and 7, kmin and kmax options) based on the Akaike Information Criterion (AIC). By default, the variances of the normal components are estimated separately (unequal). ODS graphics are enabled and some output tables (iteration history, optimization information) are hidden.
Copied!
1title2 "Three to Seven Components, Unequal Variances";
2ods graphics on;
3PROC HPFMM DATA=galaxies criterion=AIC;
4 model v = / kmin=3 kmax=7;
5 ods exclude IterHistory OptInfo ComponentInfo;
6RUN;
3 Code Block
PROC HPFMM
Explanation :
Second analysis with HPFMM, similar to the first, but with the constraint that component variances are equal (EQUATE=SCALE option). The gradient convergence criterion is disabled (gconv=0).
Copied!
1title2 "Three to Seven Components, Equal Variances";
2PROC HPFMM DATA=galaxies criterion=AIC gconv=0;
3 model v = / kmin=3 kmax=7 equate=scale;
4RUN;
4 Code Block
PROC HPFMM
Explanation :
Third and final analysis fitting a specific 5-component model (K=5), with equal variances (EQUATE=SCALE). The RESTRICT statement adds a constraint to fix the value of this common variance to 0.9025. Finally, ODS graphics are turned off.
Copied!
1title2 "Five Components, Equal Variances = 0.9025";
2PROC HPFMM DATA=galaxies;
3 model v = / K=5 equate=scale;
4 restrict int 0 (scale 1) = 0.9025;
5RUN;
6ods graphics off;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : SAS SAMPLE LIBRARY


Related Documentation

Aucune documentation spécifique pour cette catégorie.