bart bartScore

Edge Case: Scoring Loan Default Risk with Missing Data

Scénario de test & Cas d'usage

Business Context

A financial institution uses a BART classification model (trained with bartProbit) to assess loan default risk. The scoring process must be robust and handle incoming applications that may have missing information for certain fields, such as 'credit_history_years'. The goal is to see how bartScore handles missing predictor values and correctly classifies applicants based on a custom probability cutoff.
About the Set : bart

Bayesian Additive Regression Trees models.

Discover all actions of bart
Data Preparation

Create a binary classification training set for loan default. The scoring set is created with intentionally missing values (.) for the 'credit_history_years' variable for some applicants.

Copied!
1DATA mycas.loan_train;
2 call streaminit(333);
3 DO i = 1 to 3000;
4 loan_amount = 5000 + rand('UNIFORM') * 45000;
5 income = 30000 + rand('UNIFORM') * 120000;
6 credit_history_years = 1 + rand('UNIFORM') * 20;
7 default_flag = (loan_amount / income > 0.4) or (credit_history_years < 3);
8 IF rand('UNIFORM') < 0.2 THEN default_flag = 1 - default_flag; /* add noise */
9 OUTPUT;
10 END;
11RUN;
12 
13DATA mycas.loan_applications_score;
14 call streaminit(555);
15 DO application_id = 1 to 100;
16 loan_amount = 10000 + rand('UNIFORM') * 50000;
17 income = 40000 + rand('UNIFORM') * 100000;
18 /* Introduce missing values for 20% of applicants */
19 IF rand('UNIFORM') < 0.2 THEN credit_history_years = .;
20 ELSE credit_history_years = 1 + rand('UNIFORM') * 15;
21 OUTPUT;
22 END;
23RUN;

Étapes de réalisation

1
Train a bartProbit classification model for the binary 'default_flag' outcome.
Copied!
1PROC CAS;
2 bart.bartProbit /
3 TABLE='loan_train',
4 inputs={{name='loan_amount'}, {name='income'}, {name='credit_history_years'}},
5 target='default_flag',
6 saveState={name='loan_default_model', replace=true};
7QUIT;
2
Attempt to score the applications table, which contains missing values. Use classification parameters 'into' and 'intoCutpt' to generate a risk label based on a 40% probability threshold.
Copied!
1PROC CAS;
2 bart.bartScore /
3 TABLE='loan_applications_score',
4 restore='loan_default_model',
5 casOut={name='loan_risk_scored', replace=true},
6 copyVars={'application_id'},
7 into='Risk_Label',
8 intoCutpt=0.4;
9QUIT;
3
Fetch the results to verify that all rows were scored, even those with missing input data, and check the generated risk labels.
Copied!
1 
2PROC CAS;
3TABLE.fetch / TABLE='loan_risk_scored';
4QUIT;
5 

Expected Result


The action should complete without errors. The output table 'mycas.loan_risk_scored' must contain 100 rows, indicating that the action successfully processed records with missing values in a predictor variable. The table should contain 'application_id', the default predicted probability 'P_default_flag1', and the custom classification column 'Risk_Label'. The 'Risk_Label' should be 1 if 'P_default_flag1' is >= 0.4, and 0 otherwise. This demonstrates the action's robustness to imperfect data.