Published on :
Statistical INTERNAL_CREATION

Example of PROC FMM usage for binomial distribution mixtures

This code is also available in: Deutsch Español Français
Awaiting validation
The script demonstrates the capabilities of the FMM procedure. It starts by creating a 'yeast' dataset in memory. Then, it performs a first FMM analysis to fit a 2-component mixture model. A second execution generates an output dataset with posterior probabilities, which is then processed to calculate the predicted counts per component. Finally, the script performs a Bayesian analysis of the same model, activating ODS graphics for visualization and specifying the use of 2 CPU cores for performance.
Data Analysis

Type : INTERNAL_CREATION


The 'yeast' dataset is created directly within the script via a DATA step and 'datalines' statement. It contains cell counts and their observed frequency.

1 Code Block
DATA STEP Data
Explanation :
This block creates the 'yeast' dataset. It contains the variables 'count' (number of successes), 'f' (frequency of this count) and 'n' (number of trials, fixed at 5), which will be used in the binomial distribution analysis.
Copied!
1DATA yeast;
2 INPUT count f;
3 n = 5;
4 DATALINES;
5 0 213
6 1 128
7 2 37
8 3 18
9 4 3
10 5 1
11;
12RUN;
2 Code Block
PROC FMM
Explanation :
First execution of the FMM procedure. It fits a finite mixture model with k=2 binomial components to the data. 'count' is the response variable, 'n' is the number of trials, and 'f' is the frequency variable.
Copied!
1PROC FMM DATA=yeast;
2 model count/n = / k=2;
3 freq f;
4RUN;
3 Code Block
PROC FMM Data
Explanation :
Second execution of PROC FMM. In addition to the analysis, this step generates an output dataset named 'fmmout'. This dataset contains predictions for each component and the posterior probabilities of belonging to each component.
Copied!
1PROC FMM DATA=yeast;
2 model count/n = / k=2;
3 freq f;
4 OUTPUT out=fmmout pred(components) posterior;
5RUN;
4 Code Block
DATA STEP Data
Explanation :
This data block processes the output dataset 'fmmout'. It calculates the predicted counts for each component ('PredCount_1', 'PredCount_2') by multiplying the posterior probability of each observation by its frequency ('f').
Copied!
1DATA fmmout;
2 SET fmmout;
3 PredCount_1 = post_1 * f;
4 PredCount_2 = post_2 * f;
5RUN;
5 Code Block
PROC PRINT
Explanation :
Displays the content of the enriched 'fmmout' dataset, allowing inspection of posterior probabilities and predicted counts for each observation and component.
Copied!
1PROC PRINT DATA=fmmout;
2RUN;
6 Code Block
PROC FMM
Explanation :
This block performs a Bayesian analysis of the 2-component mixture model via the 'BAYES' statement. ODS graphics are activated to visualize the results (such as posterior distributions). The 'PERFORMANCE' statement suggests using two threads for calculations.
Copied!
1ods graphics on;
2PROC FMM DATA=yeast seed=12345;
3 model count/n = / k=2;
4 freq f;
5 performance cpucount=2;
6 bayes;
7RUN;
8ods graphics off;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.