simple correlation

Collinearity Check and Reliability Analysis for Credit Scoring

Scénario de test & Cas d'usage

Business Context

A retail bank is building a new credit scoring model. Before training predictive models, the data science team needs to identify highly correlated variables (multicollinearity) among customer financial metrics (Income, Debt, Credit Score) to reduce feature redundancy. Additionally, they want to verify the internal consistency reliability of these financial indicators using Cronbach's Alpha.
Data Preparation

Simulate a dataset of 5,000 banking customers with correlated variables (Income, Debt, Credit Score) and Age.

Copied!
1 
2DATA mycas.credit_risk;
3call streaminit(123);
4DO i=1 to 5000;
5Income = rand('lognormal', 10, 0.5);
6Debt = (Income * 0.4) + rand('normal', 0, 2000);
7Age = rand('integer', 18, 75);
8CreditScore = 800 - (Debt/100) + (Age * 1.5) + rand('normal', 0, 30);
9OUTPUT;
10END;
11 
12RUN;
13 

Étapes de réalisation

1
Load the simulated data into memory (implicitly done by data step) and verify table existence.
Copied!
1 
2PROC CAS;
3TABLE.tableInfo TABLE="credit_risk";
4 
5RUN;
6 
2
Calculate the correlation matrix including Cronbach's Alpha and save the results to a table named 'corr_stats'.
Copied!
1 
2PROC CAS;
3SIMPLE.correlation TABLE={name='credit_risk'} inputs={'Income', 'Debt', 'Age', 'CreditScore'} alpha=true casOut={name='corr_stats', replace=true} descriptiveStats=true;
4 
5RUN;
6 

Expected Result


The action successfully computes the correlation matrix showing strong positive correlation between Income and Debt, and correlations between CreditScore and other factors. Cronbach's Alpha is calculated and displayed in the results. A new CAS table 'corr_stats' is created containing the statistical output.