The script uses the HPSPLIT procedure to create a decision tree. It models the logarithmic salary ('logSalary') of baseball players based on various explanatory variables from the SASHELP.BASEBALL dataset. The categorical variables 'league' and 'division' are specified. An output dataset 'hpsplout' is created to store the modeling results. The random seed is fixed for reproducibility. Finally, the first 10 observations of the output dataset are displayed via PROC PRINT for a quick inspection.
Data Analysis
Type : SASHELP
The script uses the SASHELP.BASEBALL dataset, which is a standard example dataset provided with SAS, containing information about baseball players, including career statistics and salary.
1 Code Block
ODS Configuration
Explanation : Activates the Output Delivery System (ODS) for graphics generation. This is a common practice to ensure that SAS procedures that produce visualizations (like PROC HPSPLIT) generate their graphical outputs.
Copied!
ods graphics on;
1
ods graphics on;
2 Code Block
PROC HPSPLIT
Explanation : Executes the HPSPLIT (High Performance SPLIT) procedure to build a decision tree. It uses the SASHELP.BASEBALL dataset. 'seed=123' ensures the reproducibility of the results. The 'class' statement identifies 'league' and 'division' as categorical variables. The 'model' statement specifies 'logSalary' as the dependent variable and lists the explanatory variables used to build the tree. The 'output out=hpsplout' option creates a new dataset 'hpsplout' containing prediction results and other information.
Copied!
proc hpsplit data=sashelp.baseball seed=123;
class league division;
model logSalary = nAtBat nHits nHome nRuns nRBI nBB
yrMajor crAtBat crHits crHome crRuns crRbi
crBB league division nOuts nAssts nError;
output out=hpsplout;
run;
1
PROC HPSPLITDATA=sashelp.baseball seed=123;
2
class league division;
3
model logSalary = nAtBat nHits nHome nRuns nRBI nBB
4
yrMajor crAtBat crHits crHome crRuns crRbi
5
crBB league division nOuts nAssts nError;
6
OUTPUT out=hpsplout;
7
RUN;
3 Code Block
PROC PRINT
Explanation : Displays the first 10 observations of the 'hpsplout' dataset. This is useful for checking the content and structure of the dataset generated by PROC HPSPLIT, especially the added prediction variables.
Copied!
proc print data=hpsplout(obs=10); run;
1
PROC PRINTDATA=hpsplout(obs=10); RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.