pca eig

Standard Customer Behavior Dimensionality Reduction

Scénario de test & Cas d'usage

Business Context

A retail bank wants to segment its customer base for a new credit card offer. They have multiple correlated variables related to spending habits (groceries, travel, entertainment) and want to reduce these to a few principal components to identify underlying spending patterns (e.g., 'High Spender', 'Frugal') for easier clustering later.
Data Preparation

Simulating customer spending data across different categories.

Copied!
1 
2DATA casuser.customers;
3call streaminit(123);
4DO client_id = 1 to 1000;
5groceries = rand('Normal', 500, 50);
6travel = groceries * 0.5 + rand('Normal', 100, 20);
7entertainment = rand('Normal', 200, 30);
8utilities = rand('Normal', 150, 10);
9OUTPUT;
10END;
11 
12RUN;
13 

Étapes de réalisation

1
Execute PCA to extract the top 2 components using the correlation matrix and generate scoring code.
Copied!
1 
2PROC CAS;
3pca.eig / TABLE={name='customers', caslib='casuser'} inputs={'groceries', 'travel', 'entertainment', 'utilities'} n=2 prefix='Pattern' code={casOut={name='score_code', caslib='casuser', replace=true}} outStat={casOut={name='pca_stats', caslib='casuser', replace=true}};
4 
5RUN;
6 
2
Apply the generated scoring code to a new dataset (simulated here by re-using input) to verify scoring capability.
Copied!
1 
2DATA casuser.scored_customers;
3SET casuser.customers;
4%include casuser.score_code;
5 
6RUN;
7 

Expected Result


The action should successfully identify 2 principal components explaining the variance in spending. The 'pca_stats' table should contain eigenvalues. The scoring step should add 'Pattern1' and 'Pattern2' columns to the customer table.