Non-linear regression analysis with left-plateau model and visualization

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
The script begins by creating a dataset named 'lizard' from integrated raw data (datalines) for the variables 'length' and 'mass'. A PROC NLIN procedure is then used to fit a 'left-plateau' type non-linear regression model. This model states that mass remains constant (parameter b0) up to a certain length threshold (parameter tau), after which it increases linearly with length (b0 + b1*(length-tau)). Model parameters (b0, b1, tau) are initialized to facilitate convergence. The procedure also generates a new dataset 'a' containing the original data and the mass values predicted by the model ('p'). Finally, the script uses the PROC GPLOT procedure (from SAS©/GRAPH) to create two visualizations. The first displays the observed mass data as a function of length. The second overlays the observed data and the fitted model curve (predicted 'p' values vs 'length'), allowing for a visual assessment of the model's adequacy to the data. Global graphics options (GOPTIONS) and specific symbol and axis attributes are configured to improve graph readability.
Data Analysis

Type : CREATION_INTERNE


The data ('length' and 'mass') used for the analysis are created directly within the script via a DATA STEP instruction with 'datalines', meaning they are integrated into the source code and not from an external source.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block creates a new temporary dataset named 'lizard'. It defines two numeric variables, 'length' and 'mass', and populates them with raw data provided directly in the script via the DATALINES statement. This dataset is the data source for subsequent statistical analyses and visualizations.
Copied!
1 DATA lizard;
2 INPUT LENGTH mass ;
3 DATALINES;
4 22.87 0.294 23.45 0.302 23.49 0.265 23.65 0.297
5 23.76 0.294 24.36 0.338 24.44 0.295 24.44 0.347
6 24.51 0.338 24.61 0.333 24.91 0.358 24.95 0.350
7 24.95 0.331 25.00 0.327 25.16 0.345 25.26 0.334
8 25.26 0.323 25.36 0.353 25.47 0.354 25.52 0.350
9 25.61 0.361 25.76 0.362 25.82 0.327 25.86 0.354
10 25.91 0.309 25.96 0.361 25.96 0.366 26.15 0.344
11 26.20 0.358 26.27 0.348 27.12 0.371 27.28 0.421
12;
2 Code Block
PROC NLIN
Explanation :
This PROC NLIN procedure fits a non-linear regression model to the 'lizard' dataset. The specified model is a 'left-plateau model', where the dependent variable 'mass' is constant ('b0') until 'length' reaches 'tau', then it follows a linear relationship. Parameters 'b0', 'b1', and 'tau' are initialized for the procedure. The 'output out=a p=p' option creates a new dataset 'a' that includes the original variables from 'lizard' as well as the 'p' variable, representing the mass values predicted by the model.
Copied!
1 PROC NLIN;
2 parameters b0 = 0.2904 b1 = 0.0189 tau = 23.44;
3 IF LENGTH <= tau THEN DO;
4 model mass = b0;
5 END;
6 ELSE DO;
7 model mass = b0 + b1*(LENGTH-tau);
8 END;
9 OUTPUT out=a p=p;
10 RUN;
3 Code Block
PROC GPLOT
Explanation :
This block initializes global graphics options (`GOPTIONS`) to define the appearance of the graphs (colors, fonts, text sizes). It also configures a main title (`title1`), symbol definitions (`symbol1`, `symbol2`), and axis specifications (`axis1`, `axis2`) including labels and ranges. The `PROC GPLOT` procedure is then used to generate a simple scatter plot ('mass' vs 'length') from dataset 'a', using the previously established axis definitions.
Copied!
1 goptions reset=global gunit=pct border cback=white
2 colors=(black blue green red)
3 ftitle=swissb ftext=swiss htitle=4 htext=4;
4title1 'Mass vs Length with the fit';
5symbol1 color=red
6 interpol=none
7 value=dot
8 height=3;
9symbol2 color=red
10 interpol=join;
11 axis1 label=('Length (mm)')
12 order=(22 to 28 BY 1)
13 width=3;
14 axis2 label=('Mass (g)')
15 order=(0.25 to .45 BY 0.05)
16 width=3;
17PROC GPLOT DATA=a;
18 plot mass*LENGTH/ haxis=axis1 vaxis=axis2;
19 RUN;
4 Code Block
PROC GPLOT
Explanation :
This final block uses `PROC GPLOT` to create an overlaid graph. It plots the observed data ('mass' vs 'length') and the model's predicted values ('p' vs 'length') on the same graph from dataset 'a'. The `overlay` option allows visualizing both series on the same coordinate system, facilitating comparison between actual observations and the model's fit. The previously defined axis configurations are reused.
Copied!
1PROC GPLOT DATA=a;
2 plot mass*LENGTH p*LENGTH/ overlay haxis=axis1 vaxis=axis2;
3 RUN;
4QUIT;
5 RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.