Published on :
Statistical CREATION_INTERNE

Documentation Example for PROC MODECLUS (modecex5)

This code is also available in: Deutsch Español Français
Awaiting validation
The script initializes a dataset named 'test' with simple numerical data via a DATA step and datalines. It then executes PROC MODECLUS twice. The first execution uses method 6 with an R radius of 2.5 and the default threshold (0.5). The second execution modifies the threshold to 0.55. For each MODECLUS execution, the results (density and clusters) are stored in an output dataset ('out') and then visualized as a scatter plot with PROC SGPLOT, displaying density as a function of the 'x' variable, grouped by cluster. This allows for visual comparison of the effects of different thresholds on cluster formation.
Data Analysis

Type : CREATION_INTERNE


The data is created directly within the SAS script via a DATA step and datalines, in the form of a dataset named 'test' with a single numerical variable 'x'.

1 Code Block
DATA STEP Data
Explanation :
Creates a SAS dataset named 'test' containing a numerical variable 'x'. The values for 'x' are provided directly in the script via the DATALINES clause, thus simulating a unidimensional dataset for cluster analysis.
Copied!
1DATA test;
2 INPUT x;
3 DATALINES;
41 2 3 4 5 7.5 9 11.5 13 14.5 15 16
5;
2 Code Block
PROC MODECLUS
Explanation :
Performs a cluster analysis on the 'test' dataset using PROC MODECLUS with method 6. The density radius 'r' is set to 2.5, and the 'trace' option displays detailed density information. The 'short' option suppresses the distance matrix. The default clustering threshold (0.5) is used. The output dataset 'out' contains the 'x' variable, estimated densities, and cluster assignment for each observation.
Copied!
1/*-- METHOD=6 with TRACE and THRESHOLD=0.5 (default) --*/
2title 'METHOD=6 with TRACE and THRESHOLD=0.5 (default)';
3 
4PROC MODECLUS DATA=test method=6 r=2.5 trace short out=out;
5 var x;
6RUN;
3 Code Block
PROC SGPLOT
Explanation :
Generates a scatter plot from the 'out' dataset. The Y-axis represents density ('density') and the X-axis represents the 'x' variable. Points are colored and grouped according to the identified clusters ('cluster') and each point is labeled with its observation number ('_obs_'), facilitating the visualization of clustering results.
Copied!
1title2 'Plot of DENSITY*X=CLUSTER';
2 
3PROC SGPLOT DATA=out;
4 scatter y=density x=x / group=cluster datalabel=_obs_;
5RUN;
4 Code Block
PROC MODECLUS
Explanation :
Performs a second cluster analysis with PROC MODECLUS, reusing the 'test' dataset and method 6. This time, a 'threshold' of 0.55 is specified, higher than the default. This modification is intended to demonstrate how adjusting the threshold can influence the formation and number of clusters, output to the 'out' dataset.
Copied!
1/*-- METHOD=6 with TRACE and THRESHOLD=0.55 --*/
2title 'METHOD=6 with TRACE and THRESHOLD=0.55';
3 
4PROC MODECLUS DATA=test method=6 r=2.5 trace threshold=0.55 short out=out;
5 var x;
6RUN;
5 Code Block
PROC SGPLOT
Explanation :
Generates a second scatter plot to visualize the results of the second PROC MODECLUS execution, which used a threshold of 0.55. Like the previous graph, it represents density relative to 'x', grouped by the newly formed clusters, allowing for a direct comparison with the results obtained with the default threshold.
Copied!
1title2 'Plot of DENSITY*X=CLUSTER with TRACE and THRESHOLD=0.55';
2 
3PROC SGPLOT DATA=out;
4 scatter y=density x=x / group=cluster datalabel=_obs_;
5RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : S A S S A M P L E L I B R A R Y, NAME: modecex5, TITLE: Documentation Example 5 for PROC MODECLUS, PRODUCT: STAT