Non-Parametric Tweedie Regression with PROC GAMPL

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
This script illustrates the ability of the GAMPL procedure to fit non-parametric Tweedie regression models. It starts by simulating a dataset (compound Poisson distribution) with nonlinear relationships between predictors and the response. It then compares the results of a standard generalized linear model (PROC GENMOD) with two GAMPL models: one using parametric linear terms and the other using smoothing splines.
Data Analysis

Type : INTERNAL_CREATION


Data is artificially generated in the DATA step 'one'. Random variables (x1-x4) are transformed via nonlinear functions (sine, exponential, polynomials) to create a mean 'mu', which is then used to simulate a response 'y' following a Tweedie distribution.

1 Code Block
DATA STEP Data
Explanation :
Synthetic data generation. The macro variables 'phi' and 'power' control the Tweedie distribution. The code simulates 1000 observations with random predictors and a response variable 'y' constructed from complex nonlinear transformations and a compound Poisson process.
Copied!
1title 'Nonparametric Tweedie Model';
2%let phi=0.4;
3%let power=1.5;
4 
5DATA one;
6 DO i=1 to 1000;
7 
8 /* Sample the predictors */
9 x1=ranuni(1);
10 x2=ranuni(1);
11 x3=ranuni(1);
12 x4=ranuni(1);
13 
14 /* Apply nonlinear transformations to predictors */
15 f1=2*sin(3.14159265*x1);
16 f2=exp(2*x2)*0.8;
17 f3=0.2*x3**11*(10*(1-x3))**6+10*(10*x3)**3*(1-x3)**10;
18 xb=f1+f2+f3;
19 xb=xb/20;
20 mu=exp(xb);
21 
22 /* Compute parameters of compound Poisson distribution */
23 lambda=mu**(2-&power)/(&phi*(2-&power));
24 alpha=(2-&power)/(&power-1);
25 gamma=&phi*(&power-1)*(mu**(&power-1));
26 
27 /* Simulate the response */
28 rpoi=ranpoi(1,lambda);
29 IF rpoi=0 THEN y=0;
30 ELSE DO;
31 y=0;
32 DO j=1 to rpoi;
33 y=y+rangam(1,alpha);
34 END;
35 y=y*gamma;
36 END;
37 OUTPUT;
38 END;
39RUN;
2 Code Block
PROC GENMOD
Explanation :
Fitting a reference generalized linear model (GLM) using the Tweedie distribution. This model assumes a linear relationship between predictors and the response's link function, which may be insufficient given the nonlinear nature of the generated data.
Copied!
1 
2PROC GENMOD
3DATA=one;
4model y=x1 x2 x3 x4/dist=tweedie;
5RUN;
6 
3 Code Block
PROC GAMPL
Explanation :
Using PROC GAMPL to fit a model similar to GLM (parametric linear terms only) with the Tweedie distribution. This allows comparing the basic performance of GAMPL with GENMOD.
Copied!
1 
2PROC GAMPL
3DATA=one seed=1234;
4model y=param(x1 x2 x3 x4)/dist=tweedie;
5RUN;
6 
4 Code Block
PROC GAMPL
Explanation :
Fitting a complete generalized additive model (GAM). The 'spline()' terms allow modeling nonlinear relationships for each predictor. The 'plots' option generates graphs to visualize the splines fitted against the data.
Copied!
1 
2PROC GAMPL
3DATA=one seed=1234 plots;
4model y=spline(x1) spline(x2) spline(x3) spline(x4)/dist=tweedie;
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.