Example of PROC FMM usage for binomial distribution mixtures

The script demonstrates the capabilities of the FMM procedure. It starts by creating a 'yeast' dataset in memory. Then, it performs a first FMM analysis to fit a 2-component mixture model. A second execution generates an output dataset with posterior probabilities, which is then processed to calculate the predicted counts per component. Finally, the script performs a Bayesian analysis of the same model, activating ODS graphics for visualization and specifying the use of 2 CPU cores for performance.

Data Analysis

Type : INTERNAL_CREATION

The 'yeast' dataset is created directly within the script via a DATA step and 'datalines' statement. It contains cell counts and their observed frequency.

1 Code Block

DATA STEP Data

Explanation :
This block creates the 'yeast' dataset. It contains the variables 'count' (number of successes), 'f' (frequency of this count) and 'n' (number of trials, fixed at 5), which will be used in the binomial distribution analysis.

Copied!

1	DATA yeast;
2	INPUT count f;
3	n = 5;
4	DATALINES;
5	0 213
6	1 128
7	2 37
8	3 18
9	4 3
10	5 1
11	;
12	RUN;

2 Code Block

PROC FMM

Explanation :
First execution of the FMM procedure. It fits a finite mixture model with k=2 binomial components to the data. 'count' is the response variable, 'n' is the number of trials, and 'f' is the frequency variable.

Copied!

1	PROC FMM DATA=yeast;
2	model count/n = / k=2;
3	freq f;
4	RUN;

3 Code Block

PROC FMM Data

Explanation :
Second execution of PROC FMM. In addition to the analysis, this step generates an output dataset named 'fmmout'. This dataset contains predictions for each component and the posterior probabilities of belonging to each component.

Copied!

1	PROC FMM DATA=yeast;
2	model count/n = / k=2;
3	freq f;
4	OUTPUT out=fmmout pred(components) posterior;
5	RUN;

4 Code Block

DATA STEP Data

Explanation :
This data block processes the output dataset 'fmmout'. It calculates the predicted counts for each component ('PredCount_1', 'PredCount_2') by multiplying the posterior probability of each observation by its frequency ('f').

Copied!

1	DATA fmmout;
2	SET fmmout;
3	PredCount_1 = post_1 * f;
4	PredCount_2 = post_2 * f;
5	RUN;

5 Code Block

PROC PRINT

Explanation :
Displays the content of the enriched 'fmmout' dataset, allowing inspection of posterior probabilities and predicted counts for each observation and component.

Copied!

1	PROC PRINT DATA=fmmout;
2	RUN;

6 Code Block

PROC FMM

Explanation :
This block performs a Bayesian analysis of the 2-component mixture model via the 'BAYES' statement. ODS graphics are activated to visualize the results (such as posterior distributions). The 'PERFORMANCE' statement suggests using two threads for calculations.

Copied!

1	ods graphics on;
2	PROC FMM DATA=yeast seed=12345;
3	model count/n = / k=2;
4	freq f;
5	performance cpucount=2;
6	bayes;
7	RUN;
8	ods graphics off;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste