neuralNet annTrain

Edge Case: Handling Missing Values and Imbalanced Classes in Clinical Data

Scénario de test & Cas d'usage

Business Context

A research organization is analyzing clinical trial data to predict patient response to a new treatment. The dataset is small, contains numerous missing values from incomplete lab results, and the target class (positive response) is rare. The model must handle these data quality issues gracefully.
About the Set : neuralNet

Training of classical artificial neural networks.

Discover all actions of neuralNet
Data Preparation

Creation of a messy clinical dataset with a three-level nominal target ('Positive', 'Negative', 'No_Response'). Missing values are intentionally introduced in both input and target variables. The 'Positive' class is made rare to simulate class imbalance.

Copied!
1DATA clinical_messy;
2 call streaminit(222);
3 LENGTH PatientResponse $ 12;
4 DO i = 1 to 500;
5 Age = 30 + rand('Uniform') * 40;
6 Biomarker1 = 10 + rand('Normal', 0, 2);
7 Biomarker2 = 50 + rand('Normal', 0, 10);
8 PatientResponse = 'Negative';
9 IF Biomarker1 > 13 and Biomarker2 < 45 THEN PatientResponse = 'Positive';
10 IF rand('Uniform') < 0.15 THEN call missing(Biomarker1);
11 IF rand('Uniform') < 0.10 THEN call missing(Biomarker2);
12 IF rand('Uniform') < 0.05 THEN PatientResponse = 'No_Response';
13 IF rand('Uniform') < 0.05 THEN call missing(PatientResponse);
14 OUTPUT;
15 END;
16RUN;

Étapes de réalisation

1
Load the messy clinical data into a CAS table.
Copied!
1 
2PROC CASUTIL;
3load
4DATA=clinical_messy casout={name='clinical_messy', replace=true};
5RUN;
6 
2
Train a GLIM model, specifying imputation for missing inputs ('MEAN') and targets ('NONE', to drop them). Use 'inversePriors=TRUE' to counteract the class imbalance by adjusting observation weights.
Copied!
1PROC CAS;
2 ACTION neuralNet.annTrain /
3 TABLE={name='clinical_messy'},
4 inputs={'Age', 'Biomarker1', 'Biomarker2'},
5 target='PatientResponse',
6 nominals={'PatientResponse'},
7 arch='GLIM',
8 missing='MEAN',
9 targetMissing='NONE',
10 inversePriors=TRUE,
11 errorFunc='ENTROPY',
12 targetAct='SOFTMAX',
13 nloOpts={algorithm='LBFGS', maxIters=100},
14 saveState={name='clinical_model', replace=true};
15RUN;

Expected Result


The action must run successfully without errors, demonstrating its ability to handle missing values as specified. The output should show that observations with missing targets were ignored. The 'Model Information' table should confirm that inverse prior weights were used. A CAS table 'clinical_model' is created, storing the trained model that has been adjusted for the imbalanced data.