phreg cox

High-Volume Variable Selection for Equipment Failure

Scénario de test & Cas d'usage

Business Context

A manufacturing plant wants to predict machine failure based on telemetry data. They collect dozens of sensor readings (temperature, vibration, pressure, etc.) but don't know which ones are actually predictive of failure. They need to select the most important features to build a risk scoring model.
Data Preparation

Simulate a dataset with 500 machines and 20 potential sensor predictors, where only 3 are truly predictive.

Copied!
1 
2DATA mycas.machine_sensors;
3call streaminit(456);
4array S[20] s1-s20;
5DO i = 1 to 500;
6DO j=1 to 20;
7S[j] = rand('normal');
8END;
9TrueRisk = 0.8*s1 - 0.5*s5 + 0.3*s10;
10Life = rand('weibull', 1.5) * exp(-TrueRisk);
11Fail = 1;
12IF Life > 10 THEN DO;
13Life = 10;
14Fail = 0;
15END;
16OUTPUT;
17END;
18 
19RUN;
20 

Étapes de réalisation

1
Run Cox regression with Stepwise selection to isolate key sensors.
Copied!
1 
2PROC CAS;
3phreg.cox TABLE={name='machine_sensors'}, model={depVars={{name='Life', event='Fail(1)'}}, effects={{vars={'s1', 's2', 's3', 's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20'}}}}, selection={method='STEPWISE', slEntry=0.1, slStay=0.1};
4 
5RUN;
6 
2
Generate risk scores (xBeta) for the selected model.
Copied!
1 
2PROC CAS;
3phreg.cox TABLE={name='machine_sensors'}, model={depVars={{name='Life', event='Fail(1)'}}, effects={{vars={'s1', 's5', 's10'}}}}, OUTPUT={casOut={name='scored_machines', replace=true}, xBeta='RiskScore'};
4 
5RUN;
6 

Expected Result


The Stepwise selection should filter out the noise variables (s2, s3, etc.) and retain the significant predictors (s1, s5, s10). The second step generates a table 'scored_machines' containing the 'RiskScore', which the plant can use to prioritize maintenance.