Published on :
Statistique CREATION_INTERNE

Robust Regression with PROC NLIN (IRLS) and PROC ROBUSTREG

This code is also available in: Deutsch Español Français
Awaiting validation
This script models US population growth. It first demonstrates how to implement the Iteratively Reweighted Least Squares (IRLS) method using Tukey's 'biweight' function directly in PROC NLIN thanks to the special variable `_weight_`. It then compares this approach with the standardized PROC ROBUSTREG procedure using an M-estimator.
Data Analysis

Type : CREATION_INTERNE


Population data (uspop) is created directly in the script via a DATA step and DATALINES statements.

1 Code Block
DATA STEP Data
Explanation :
Creation of the 'uspop' dataset. The INPUT statement uses ' @@' to read multiple observations on the same data line. The 'year' and 'yearsq' (year squared) variables are generated for the quadratic model.
Copied!
1title 'U.S. Population Growth';
2DATA uspop;
3 INPUT pop :6.3 @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
4 retain year 1780;
5 year = year+10;
6 yearsq = year*year;
7 DATALINES;
83929 5308 7239 9638 12866 17069 23191 31443 39818 50155
962947 75994 91972 105710 122775 131669 151325 179323 203211
10226542 248710
11;
2 Code Block
PROC NLIN Data
Explanation :
Execution of a non-linear regression. The script manually implements a robust regression (IRLS) by defining the special `_weight_` variable. Weights are recalculated at each iteration based on the residuals (`resid`) according to Tukey's biweight function.
Copied!
1title 'Beaton/Tukey Biweight Robust Regression using IRLS';
2PROC NLIN DATA=uspop nohalve;
3 parms b0=20450.43 b1=-22.7806 b2=.0063456;
4 model pop=b0+b1*year+b2*year*year;
5 resid = pop-model.pop;
6 sigma = 2;
7 k = 4.685;
8 IF abs(resid/sigma)<=k THEN _weight_=(1-(resid / (sigma*k))**2)**2;
9 ELSE _weight_=0;
10 OUTPUT out=c r=rbi;
11RUN;
3 Code Block
DATA STEP Data
Explanation :
DATA step to recalculate or verify the final weights based on the residuals (rbi) stored in the PROC NLIN output table.
Copied!
1DATA c;
2 SET c;
3 sigma = 2;
4 k = 4.685;
5 IF abs(rbi/sigma)<=k THEN _weight_=(1-(rbi /(sigma*k))**2)**2;
6 ELSE _weight_=0;
7RUN;
4 Code Block
PROC PRINT
Explanation :
Display of manual regression results (table 'c').
Copied!
1PROC PRINT DATA=c;
2RUN;
5 Code Block
PROC ROBUSTREG Data
Explanation :
Use of the PROC ROBUSTREG procedure dedicated to robust regression. It uses the M-estimation method here. Results are stored in the 'weights' table.
Copied!
1PROC ROBUSTREG DATA=uspop method=m(scale=2);
2 model pop = year year*year;
3 OUTPUT out=weights weight=w;
4RUN;
6 Code Block
PROC PRINT
Explanation :
Display of weights calculated by the ROBUSTREG procedure for comparison.
Copied!
1PROC PRINT DATA=weights;
2RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.