This script models US population growth. It first demonstrates how to implement the Iteratively Reweighted Least Squares (IRLS) method using Tukey's 'biweight' function directly in PROC NLIN thanks to the special variable `_weight_`. It then compares this approach with the standardized PROC ROBUSTREG procedure using an M-estimator.
Data Analysis
Type : CREATION_INTERNE
Population data (uspop) is created directly in the script via a DATA step and DATALINES statements.
1 Code Block
DATA STEP Data
Explanation : Creation of the 'uspop' dataset. The INPUT statement uses ' @@' to read multiple observations on the same data line. The 'year' and 'yearsq' (year squared) variables are generated for the quadratic model.
Copied!
title 'U.S. Population Growth';
data uspop;
input pop :6.3 @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
retain year 1780;
year = year+10;
yearsq = year*year;
datalines;
3929 5308 7239 9638 12866 17069 23191 31443 39818 50155
62947 75994 91972 105710 122775 131669 151325 179323 203211
226542 248710
;
1
title 'U.S. Population Growth';
2
DATA uspop;
3
INPUT pop :6.3 @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
Explanation : Execution of a non-linear regression. The script manually implements a robust regression (IRLS) by defining the special `_weight_` variable. Weights are recalculated at each iteration based on the residuals (`resid`) according to Tukey's biweight function.
Copied!
title 'Beaton/Tukey Biweight Robust Regression using IRLS';
proc nlin data=uspop nohalve;
parms b0=20450.43 b1=-22.7806 b2=.0063456;
model pop=b0+b1*year+b2*year*year;
resid = pop-model.pop;
sigma = 2;
k = 4.685;
if abs(resid/sigma)<=k then _weight_=(1-(resid / (sigma*k))**2)**2;
else _weight_=0;
output out=c r=rbi;
run;
1
title 'Beaton/Tukey Biweight Robust Regression using IRLS';
2
PROC NLINDATA=uspop nohalve;
3
parms b0=20450.43 b1=-22.7806 b2=.0063456;
4
model pop=b0+b1*year+b2*year*year;
5
resid = pop-model.pop;
6
sigma = 2;
7
k = 4.685;
8
IF abs(resid/sigma)<=k THEN _weight_=(1-(resid / (sigma*k))**2)**2;
9
ELSE _weight_=0;
10
OUTPUT out=c r=rbi;
11
RUN;
3 Code Block
DATA STEP Data
Explanation : DATA step to recalculate or verify the final weights based on the residuals (rbi) stored in the PROC NLIN output table.
Copied!
data c;
set c;
sigma = 2;
k = 4.685;
if abs(rbi/sigma)<=k then _weight_=(1-(rbi /(sigma*k))**2)**2;
else _weight_=0;
run;
1
DATA c;
2
SET c;
3
sigma = 2;
4
k = 4.685;
5
IF abs(rbi/sigma)<=k THEN _weight_=(1-(rbi /(sigma*k))**2)**2;
6
ELSE _weight_=0;
7
RUN;
4 Code Block
PROC PRINT
Explanation : Display of manual regression results (table 'c').
Copied!
proc print data=c;
run;
1
PROC PRINTDATA=c;
2
RUN;
5 Code Block
PROC ROBUSTREG Data
Explanation : Use of the PROC ROBUSTREG procedure dedicated to robust regression. It uses the M-estimation method here. Results are stored in the 'weights' table.
Copied!
proc robustreg data=uspop method=m(scale=2);
model pop = year year*year;
output out=weights weight=w;
run;
1
PROC ROBUSTREGDATA=uspop method=m(scale=2);
2
model pop = year year*year;
3
OUTPUT out=weights weight=w;
4
RUN;
6 Code Block
PROC PRINT
Explanation : Display of weights calculated by the ROBUSTREG procedure for comparison.
Copied!
proc print data=weights;
run;
1
PROC PRINTDATA=weights;
2
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.