Published on :
Statistics CREATION_INTERNE

Analysis of Mixture Distributions with PROC HPFMM on Galaxy Data

This code is also available in: Deutsch Español Français
Awaiting validation
The script begins by creating a 'galaxies' dataset containing the velocities of several galaxies. Then, it applies the HPFMM procedure in three steps: 1) Search for the optimal number of components (from 3 to 7) with unequal variances, based on the AIC criterion. 2) Same search but forcing equal variances between components. 3) Adjustment of a final 5-component model with a constraint on the common variance value.
Data Analysis

Type : CREATION_INTERNE


The data is created directly in the script via a DATA step and a DATALINES statement. The 'velocity' variable is read and transformed into a new variable 'v' for analysis.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block reads the galaxy velocity data provided via 'datalines'. The '@@' operator (double trailing at) allows reading multiple observations on the same data line. A new variable 'v' is calculated by dividing 'velocity' by 1000 for scaling.
Copied!
1title "HPFMM Analysis of Galaxies Data";
2DATA galaxies;
3 INPUT velocity @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
4 v = velocity / 1000;
5 DATALINES;
69172 9350 9483 9558 9775 10227 10406 16084 16170 18419
718552 18600 18927 19052 19070 19330 19343 19349 19440 19473
819529 19541 19547 19663 19846 19856 19863 19914 19918 19973
919989 20166 20175 20179 20196 20215 20221 20415 20629 20795
1020821 20846 20875 20986 21137 21492 21701 21814 21921 21960
1122185 22209 22242 22249 22314 22374 22495 22746 22747 22888
1222914 23206 23241 23263 23484 23538 23542 23666 23706 23711
1324129 24285 24289 24366 24717 24990 25633 26960 26995 32065
1432789 34279
15;
16 
2 Code Block
PROC HPFMM
Explanation :
First analysis with HPFMM to determine the optimal number of components (between 3 and 7, kmin and kmax options) based on the Akaike Information Criterion (AIC). By default, the variances of the normal components are estimated separately (unequal). ODS graphics are enabled and some output tables (iteration history, optimization information) are hidden.
Copied!
1title2 "Three to Seven Components, Unequal Variances";
2ods graphics on;
3PROC HPFMM DATA=galaxies criterion=AIC;
4 model v = / kmin=3 kmax=7;
5 ods exclude IterHistory OptInfo ComponentInfo;
6RUN;
3 Code Block
PROC HPFMM
Explanation :
Second analysis with HPFMM, similar to the first, but with the constraint that component variances are equal (EQUATE=SCALE option). The gradient convergence criterion is disabled (gconv=0).
Copied!
1title2 "Three to Seven Components, Equal Variances";
2PROC HPFMM DATA=galaxies criterion=AIC gconv=0;
3 model v = / kmin=3 kmax=7 equate=scale;
4RUN;
4 Code Block
PROC HPFMM
Explanation :
Third and final analysis fitting a specific 5-component model (K=5), with equal variances (EQUATE=SCALE). The RESTRICT statement adds a constraint to fix the value of this common variance to 0.9025. Finally, ODS graphics are turned off.
Copied!
1title2 "Five Components, Equal Variances = 0.9025";
2PROC HPFMM DATA=galaxies;
3 model v = / K=5 equate=scale;
4 restrict int 0 (scale 1) = 0.9025;
5RUN;
6ods graphics off;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : SAS SAMPLE LIBRARY