Published on :

Distributional Analysis and Empirical Cumulative Distribution Function (ECDF)

This code is also available in: Deutsch Español Français
Awaiting validation
The script is divided into two main parts, each using a different macro to generate ODS graphics. The first macro, `%ecdf`, uses the `UNIVARIATE` procedure to obtain descriptive statistics, a histogram, and a plot of the cumulative distribution function. The second macro, `%ecdf2`, manually calculates the cumulative distribution function via a `DATA STEP` and then uses `PROC SGPLOT` to create the histogram and ECDF plot. The graphical outputs are saved to a path specified by `ods listing gpath`.
Data Analysis

Type : SASHELP


The data used for the analysis comes from the SASHELP library (`sashelp.cars`), which is an internal SAS library containing example datasets.

1 Code Block
ODS Configuration
Explanation :
This block configures the destination of ODS (Output Delivery System) outputs for graphics. It specifies a directory path (`gpath`) and resolution (`image_dpi`) for the generated images.
Copied!
1ods listing gpath="/home/nicolasdupont0/resources_github/Graph/Distribution/img" image_dpi=200;
2 
2 Code Block
ODS Graphics Configuration
Explanation :
This block configures global options for ODS graphics. It resets parameters, sets attribute priority, image size, file name (`ecdf1`), and image formats (PNG).
Copied!
1ods graphics /
2reset = all attrpriority=color border = no width = 600px height = 400px
3imagename = "ecdf1" imagefmt = png outputfmt = png antialiasmax = 10000;
4 
3 Code Block
MACRO Definition
Explanation :
This macro, `%ecdf`, is designed to generate descriptive analyses and distribution plots for a given variable. It uses `PROC UNIVARIATE` three times: for basic descriptive statistics, for a histogram with an inset displaying the number of observations, and for a cumulative distribution function (CDF) plot with an option for a normal curve and an inset.
Copied!
1%macro ecdf(DATA,var);
2 
3 title "Descriptive statistics on &var.";
4 PROC UNIVARIATE DATA=&DATA;
5 var &var;
6 RUN;
7
8 title "Distribution of &var.";
9 PROC UNIVARIATE DATA=&DATA noprint;
10 histogram &var / odstitle = title;
11 inset n = 'Number of observations' / position=ne;
12 RUN;
13
14 title "Cumulative Distribution of &var.";
15 PROC UNIVARIATE DATA=&DATA noprint;
16 cdf &var / normal;
17 /*inset normal(mu sigma);*/
18 inset n = 'Number of observations' / position=nw;
19 RUN;
20
21 title;
22 
23%mend ecdf;
4 Code Block
MACRO Call
Explanation :
This call executes the `%ecdf` macro using the `sashelp.cars` dataset and the `Horsepower` variable to generate the corresponding analyses and plots.
Copied!
1%ecdf(sashelp.cars,Horsepower);
5 Code Block
ODS Graphics Configuration
Explanation :
This block configures global options for ODS graphics, similar to the previous one, but for the second set of graphics. It sets the image name to `ecdf2`.
Copied!
1ods graphics /
2reset = all attrpriority=color border = no width = 600px height = 400px
3imagename = "ecdf2" imagefmt = png outputfmt = png antialiasmax = 10000;
4 
6 Code Block
MACRO Definition Data
Explanation :
This macro, `%ecdf2`, implements a manual approach to calculate and visualize the ECDF. It starts by creating a temporary dataset (`tmp`) with the variable of interest, sorts it, and then calculates the cumulative proportion (`ecdf`) of each observation in a `DATA STEP`. A `CALL SYMPUT` is used to store the number of observations in a macro-variable. Finally, it uses `PROC SGPLOT` to generate a histogram and a series plot for the cumulative distribution function, with custom titles displaying the number of observations.
Copied!
1%macro ecdf2(DATA,var);
2 
3 DATA tmp (keep=&var);
4 SET &DATA.;
5 RUN;
6
7 PROC SORT DATA=tmp;
8 BY &var.;
9 RUN;
10
11 DATA tmp;
12 SET tmp nobs=obs;
13 nv = _N_;
14 p = nv/obs;
15 ecdf = int(p*100);
16 call symput("nbvalue",compress(nv));
17 RUN;
18 PROC SORT DATA=tmp; BY nv; RUN;
19
20
21 /*
22 title "Cumulative Distribution of &var.";
23 symbol1 i=j v=none c=blue;
24 proc gplot data=tmp;
25 plot ecdf * &var;
26 run;
27 quit;
28 title;
29 */
30 title "Distribution of &var.";
31 title2 "Number of observations = &nbvalue";
32 PROC SGPLOT DATA=tmp;
33 histogram &var;
34 XAXIS label="&var" grid;
35 YAXIS label="Percentage";
36 RUN;
37
38 title "Cumulative Distribution of &var.";
39 title2 "Number of observations = &nbvalue";
40 PROC SGPLOT DATA=tmp;
41 series x=&var y=ecdf;
42 XAXIS label="&var" grid;
43 YAXIS label="Cumulative Percent" grid;
44 RUN;
45 title;
46 title2;
47 
48%mend ecdf2;
7 Code Block
MACRO Call
Explanation :
This call executes the `%ecdf2` macro using the `sashelp.cars` dataset and the `Horsepower` variable to perform the alternative analysis and visualization of the distribution.
Copied!
1%ecdf2(sashelp.cars,Horsepower);
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Created : 21/07/2017 (fr), Last update : 21/07/2017 (fr), Author(s) : Nicolas Dupont