Power Analysis and Monte Carlo Simulation for Two-Means Test

Difficulty Level

Beginner

Published on : 20/05/2023

The script begins by using PROC POWER to calculate the theoretical power for various parameter combinations of a two-means test. It then generates graphs to visualize these relationships. A manual implementation of power calculation is also included for comparison. The second part of the script implements a Monte Carlo simulation to empirically estimate power: it generates thousands of datasets under different conditions, performs t-tests on each, then calculates the proportion of significant tests, thus providing a simulated power estimate.

Data Analysis

Type : CREATION_INTERNE

All data used in this script is generated internally, either by PROC POWER (power calculations) or by DATA STEPs (simulated data generation for t-test and manual power calculation).

1 Code Block

PROC POWER

Explanation :
This block uses the PROC POWER procedure to calculate statistical power for an independent two-means comparison test (test=diff). It explores different combinations of mean difference (meandiff), standard deviation (stddev), significance level (alpha), and total sample size (ntotal). The value `power = .` indicates that power is the value to be calculated.

Copied!

1	PROC POWER;
2	twosamplemeans test=diff
3	meandiff = 5 6
4	stddev = 12 18
5	alpha = 0.05 0.1
6	ntotal = 100 200
7	power = .;
8	RUN;

2 Code Block

PROC POWER

Explanation :
This block activates the Output Delivery System (ODS) to generate graphs. PROC POWER is executed with the `plotonly` option to produce only power plots, without the textual tables. The `plot` subcommand customizes the appearance of the graphs, allowing color, line type, and symbol to vary based on different variables. `ods output output=powdata;` saves the data used for the graphs into a dataset named `powdata`.

Copied!

1	ods graphics on;
2
3	PROC POWER plotonly;
4	twosamplemeans test=diff
5	meandiff = 5 6
6	stddev = 12 18
7	alpha = 0.05 0.1
8	ntotal = 100 200
9	power = .;
10	plot;
11	RUN;
12
13	ods graphics off;
14
15	plot
16	min=60
17	yopts=(ref=0.9 crossref=yes)
18	vary(color BY stddev, linestyle BY meandiff, symbol BY alpha);
19
20	ods OUTPUT OUTPUT=powdata;

3 Code Block

DATA STEP Data

Explanation :
This DATA STEP creates a dataset named `tpow` by calculating power for each combination of the specified parameters. `ncp` (non-centrality parameter) and `critval` (critical value of the F distribution) are calculated. Power is then determined using the `sdf` (survival density function) of the non-central F distribution.

Copied!

1	DATA tpow;
2	DO meandiff = 5, 6;
3	DO stddev = 12, 18;
4	DO alpha = 0.05, 0.1;
5	DO ntotal = 100, 200;
6	ncp = ntotal * 0.5 * 0.5 * meandiff2 / stddev2;
7	critval = finv(1-alpha, 1, ntotal-2, 0);
8	power = sdf('f', critval, 1, ntotal-2, ncp);
9	OUTPUT;
10	END;
11	END;
12	END;
13	END;
14	RUN;

4 Code Block

PROC PRINT

Explanation :
This block uses PROC PRINT to display the content of the `tpow` dataset, allowing visualization of the manually calculated power values.

Copied!

1	PROC PRINT DATA=tpow;
2	RUN;

5 Code Block

DATA STEP Data

Explanation :
This block initializes several macro variables (`meandiff`, `stddev`, `alpha`, `ntotal`, `nsim`) that will be used in the simulation. The DATA STEP `simdata` then generates a large number (`&nsim`) of simulated datasets. For each simulation (`isim`), it creates observations for two groups (`group` 1 and 2) where the variable `y` is drawn from a normal distribution. Group 1 has a mean of 0, and Group 2 has a mean equal to `&meandiff`, with a common standard deviation `&stddev`. `call streaminit(123)` ensures reproducibility of random number generation.

Copied!

1	%let meandiff = 5;
2	%let stddev = 12;
3	%let alpha = 0.05;
4	%let ntotal = 100;
5	%let nsim = 10000;
6
7	DATA simdata;
8	call streaminit(123);
9	DO isim = 1 to ≁
10	DO i = 1 to floor(&ntotal/2);
11	group = 1;
12	y = rand('normal', 0 , &stddev);
13	OUTPUT;
14	group = 2;
15	y = rand('normal', &meandiff, &stddev);
16	OUTPUT;
17	END;
18	END;
19	RUN;

6 Code Block

PROC TTEST

Explanation :
This block temporarily disables all standard ODS output to avoid excessive display. It then runs PROC TTEST on the simulated data (`simdata`). The `by isim` statement performs a t-test for each individual simulation. `class group` defines the two groups to be compared, and `var y` specifies the variable to be tested. `ods output ttests=tests` saves the t-test results to a new dataset named `tests`. Finally, ODS output is re-enabled.

Copied!

1	ods exclude all;
2	PROC TTEST DATA=simdata;
3	ods OUTPUT ttests=tests;
4	BY isim;
5	class group;
6	var y;
7	RUN;
8	ods exclude none;

7 Code Block

DATA STEP Data

Explanation :
This DATA STEP processes the `tests` dataset (from PROC TTEST). It filters observations to retain only results corresponding to the "Pooled" method (combined variance). A new variable `issig` is created, taking the value 1 if the p-value (`probt`) of the test is less than the alpha threshold (`&alpha`), indicating a significant test, and 0 otherwise.

Copied!

1	DATA tests;
2	SET tests;
3	where method="Pooled";
4	issig = probt < α
5	RUN;

8 Code Block

PROC FREQ

Explanation :
This block uses PROC FREQ on the `tests` dataset to analyze the frequency of the `issig` variable. The `ods select binomial` option allows displaying only the binomial test results. The `tables issig / binomial(level='1')` statement calculates the proportion of significant tests (`issig = 1`) and provides a binomial confidence interval for this proportion, which represents the simulated estimate of statistical power.

Copied!

1	PROC FREQ DATA=tests;
2	ods select binomial;
3	tables issig / binomial(level='1');
4	RUN;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Difficulty Level

Published on : 20/05/2023

Data Analysis

1 Code Block

2 Code Block

3 Code Block

4 Code Block

5 Code Block

6 Code Block

7 Code Block

8 Code Block

Related Documentation