Performance/Volume Case: Analyzing Large-Scale Clinical Trial Data

Business Context

A pharmaceutical company is analyzing data from a large-scale clinical trial (500,000 patients) for a new drug. They need to understand how different dosage levels (50mg, 100mg, 150mg) affect patient recovery probability, while controlling for patient age and disease severity. The performance of the calculation on a large dataset is critical.

About the Set : bart

Bayesian Additive Regression Trees models.

Discover all actions of bart

Data Preparation

Generates a large dataset simulating patient records from a clinical trial. A BART model is then trained to predict recovery based on dosage and patient factors.

Copied!

1	DATA mycas.clinical_trial_large(drop=p);
2	call streaminit(789);
3	DO i = 1 to 500000;
4	age = 30 + floor(rand('UNIFORM') * 50);
5	severity = rand('INTEGER', 1, 5);
6	rand_dosage = rand('UNIFORM');
7	IF rand_dosage < 0.33 THEN dosage_mg = 50;
8	ELSE IF rand_dosage < 0.66 THEN dosage_mg = 100;
9	ELSE dosage_mg = 150;
10	p = 1 / (1 + exp(-( -3 + (0.02 * age) + (0.2 * severity) + (0.01 * dosage_mg) )));
11	recovered = rand('BERNOULLI', p);
12	OUTPUT;
13	END;
14	RUN;
15
16	PROC CAS;
17	bart.bartProbit TABLE={name='clinical_trial_large'},
18	model={depVars={{name='recovered', levelType='BINARY'}},
19	effects={{vars={'age', 'severity', 'dosage_mg'}}}},
20	store={name='drug_efficacy_model', replace=true};
21	QUIT;

Étapes de réalisation

Execute the bartScoreMargin action on the large table, calculating margins for each dosage level. Use a non-default alpha for a 99% credible interval and save the output tables.

Copied!

1	PROC CAS;
2	bart.bartScoreMargin
3	TABLE='clinical_trial_large',
4	model='drug_efficacy_model',
5	seed=999,
6	alpha=0.01,
7	margins={{
8	name='margin_50mg',
9	at={{var='dosage_mg', value=50}}
10	},
11	{
12	name='margin_100mg',
13	at={{var='dosage_mg', value=100}}
14	},
15	{
16	name='margin_150mg',
17	at={{var='dosage_mg', value=150}}
18	}},
19	casOut={name='large_trial_margins', replace=true},
20	outputTables={names={Margins='Margins_Summary', MarginInfo='Margin_Info_Summary'}};
21	QUIT;

Expected Result

The action should complete successfully on the large dataset within an acceptable time frame. It will produce a 'large_trial_margins' CAS table with the posterior samples for each dosage margin. The saved summary tables ('Margins_Summary', 'Margin_Info_Summary') will allow researchers to compare the mean recovery probabilities and 99% credible intervals for each dosage level, providing strong evidence for dosage recommendations.

Voir la documentation technique de bartScoreMargin