Published on :

Analysis with PROC HPSPLIT

This code is also available in: Deutsch Español Français
The script uses the HPSPLIT procedure to create a decision tree. It models the logarithmic salary ('logSalary') of baseball players based on various explanatory variables from the SASHELP.BASEBALL dataset. The categorical variables 'league' and 'division' are specified. An output dataset 'hpsplout' is created to store the modeling results. The random seed is fixed for reproducibility. Finally, the first 10 observations of the output dataset are displayed via PROC PRINT for a quick inspection.
Data Analysis

Type : SASHELP


The script uses the SASHELP.BASEBALL dataset, which is a standard example dataset provided with SAS, containing information about baseball players, including career statistics and salary.

1 Code Block
ODS Configuration
Explanation :
Activates the Output Delivery System (ODS) for graphics generation. This is a common practice to ensure that SAS procedures that produce visualizations (like PROC HPSPLIT) generate their graphical outputs.
Copied!
1ods graphics on;
2 Code Block
PROC HPSPLIT
Explanation :
Executes the HPSPLIT (High Performance SPLIT) procedure to build a decision tree. It uses the SASHELP.BASEBALL dataset. 'seed=123' ensures the reproducibility of the results. The 'class' statement identifies 'league' and 'division' as categorical variables. The 'model' statement specifies 'logSalary' as the dependent variable and lists the explanatory variables used to build the tree. The 'output out=hpsplout' option creates a new dataset 'hpsplout' containing prediction results and other information.
Copied!
1PROC HPSPLIT DATA=sashelp.baseball seed=123;
2 class league division;
3 model logSalary = nAtBat nHits nHome nRuns nRBI nBB
4 yrMajor crAtBat crHits crHome crRuns crRbi
5 crBB league division nOuts nAssts nError;
6 OUTPUT out=hpsplout;
7RUN;
3 Code Block
PROC PRINT
Explanation :
Displays the first 10 observations of the 'hpsplout' dataset. This is useful for checking the content and structure of the dataset generated by PROC HPSPLIT, especially the added prediction variables.
Copied!
1PROC PRINT DATA=hpsplout(obs=10); RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : S A S S A M P L E L I B R A R Y