Published on :
Statistical CREATION_INTERNE

Analysis of 'Galaxies' data with PROC FMM

This code is also available in: Deutsch Français
Awaiting validation
The script begins by creating a SAS© table 'galaxies' from internal data (datalines), containing the velocities of 82 galaxies. The velocity is then normalized. It then explores several normal distribution mixture models via PROC FMM: 1) a model where the number of components varies from 3 to 7 with unequal variances, 2) a similar model but with equal variances, and 3) a final model fixing the number of components at 5 with a constrained variance value. The objective is to model the distribution of velocities, which is known to be multimodal.
Data Analysis

Type : CREATION_INTERNE


The data is entirely contained within the script via a DATALINES statement. A 'galaxies' table is created with a 'velocity' variable which is then transformed into 'v'.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block creates the 'galaxies' table. It reads velocity values from the embedded data lines (datalines). The ' @@' operator allows reading multiple observations per data line. A new variable 'v' is calculated by dividing the velocity by 1000 to normalize it.
Copied!
1title "FMM Analysis of Galaxies Data";
2DATA galaxies;
3 INPUT velocity @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
4 v = velocity / 1000;
5 DATALINES;
69172 9350 9483 9558 9775 10227 10406 16084 16170 18419
718552 18600 18927 19052 19070 19330 19343 19349 19440 19473
819529 19541 19547 19663 19846 19856 19863 19914 19918 19973
919989 20166 20175 20179 20196 20215 20221 20415 20629 20795
1020821 20846 20875 20986 21137 21492 21701 21814 21921 21960
1122185 22209 22242 22249 22314 22374 22495 22746 22747 22888
1222914 23206 23241 23263 23484 23538 23542 23666 23706 23711
1324129 24285 24289 24366 24717 24990 25633 26960 26995 32065
1432789 34279
15;
16RUN;
2 Code Block
PROC FMM
Explanation :
This FMM procedure analyzes the 'v' variable to find the best normal distribution mixture model, testing a number of components from 3 to 7 (kmin=3, kmax=7). The Akaike Information Criterion (AIC) is used to select the best 'k'. Component variances are allowed to be different.
Copied!
1title2 "Three to Seven Components, Unequal Variances";
2ods graphics on;
3PROC FMM DATA=galaxies criterion=AIC;
4 model v = / kmin=3 kmax=7;
5RUN;
3 Code Block
PROC FMM
Explanation :
A second FMM analysis is performed, similar to the previous one, but by adding the 'equate=scale' option. This option constrains the mixture components to have equal variances, which simplifies the model.
Copied!
1title2 "Three to Seven Components, Equal Variances";
2PROC FMM DATA=galaxies criterion=AIC gconv=0;
3 model v = / kmin=3 kmax=7 equate=scale;
4RUN;
4 Code Block
PROC FMM
Explanation :
This block fits a finite mixture model with a fixed number of 5 components (K=5) and equal variances ('equate=scale'). Additionally, a RESTRICT statement is used to fix the variance ('scale') value at 0.9025.
Copied!
1title2 "Five Components, Equal Variances = 0.9025";
2PROC FMM DATA=galaxies;
3 model v = / K=5 equate=scale;
4 restrict int 0 (scale 1) = 0.9025;
5RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : SAS SAMPLE LIBRARY