The bnet action from the Bayesian Net Classifier action set uses Bayesian network models to classify a target variable. It allows for various network structures and variable selection methods to build a predictive model.
| Parameter | Description |
|---|---|
| alpha | Specifies the significance level for independence tests using chi-square or G-square statistics. You can specify up to five values to find the best model. |
| attributes | Changes the attributes of variables used in the action. |
| bestModel | When set to True, selects the best model based on validation data or cross-validation. |
| code | Specifies the settings for generating SAS DATA step scoring code. |
| codeGroup | Specifies a group for the generated code. |
| display | Specifies a list of results tables to be displayed. |
| freq | Specifies the frequency variable for the analysis. |
| id | Specifies variables to be copied to the output table. |
| indepTest | Specifies the method for independence tests (e.g., CHISQUARE, GSQUARE, MI). |
| inNetwork | Specifies the input table that defines links to be included or excluded from the network structure. |
| inputs | Specifies the input variables for the analysis. |
| maxParents | Specifies the maximum number of parents for each node in the network. |
| miAlpha | Specifies the significance level for independence tests that use mutual information. |
| missingInt | Specifies how to handle missing values for interval variables (IGNORE or IMPUTE). |
| missingNom | Specifies how to handle missing values for nominal variables (IGNORE, IMPUTE, or LEVEL). |
| nominals | Specifies the nominal variables to be used in the analysis. |
| numBin | Specifies the number of bins to use for interval variables. |
| outNetwork | Specifies the output table for the network structure and probability distributions. |
| output | Specifies the output table to store predicted values. |
| outputTables | Lists the names of results tables to save as CAS tables. |
| parenting | Specifies the structure learning method (BESTONE or BESTSET). |
| partByFrac | Partitions the input data by specifying fractions for training, testing, and validation. |
| partByVar | Partitions the input data based on the values of a specified variable. |
| preScreening | Specifies the initial screening method for input variables (ONE or ZERO). |
| printtarget | When set to True, generates names for the predicted target and probability variables. |
| resident | Specifies whether the model should be kept in memory. |
| saveState | Specifies the table in which to save the model state for future scoring. |
| structures | Specifies the network structure types to be learned (e.g., NAIVE, TAN, PC). |
| table | Specifies the input data table for the analysis. |
| target | Specifies the target variable for classification. |
| varSelect | Specifies the variable selection method beyond prescreening (ZERO, ONE, TWO, THREE). |
This example creates a simple dataset named 'golf' with weather conditions and a decision on whether to play golf. This dataset will be used to train a Bayesian network classifier.
| 1 | DATA casuser.golf; |
| 2 | INFILE DATALINES delimiter=','; |
| 3 | INPUT Outlook $ Temperature $ Humidity $ Windy $ Play $; |
| 4 | DATALINES; |
| 5 | Sunny,Hot,High,False,No |
| 6 | Sunny,Hot,High,True,No |
| 7 | Overcast,Hot,High,False,Yes |
| 8 | Rainy,Mild,High,False,Yes |
| 9 | Rainy,Cool,Normal,False,Yes |
| 10 | Rainy,Cool,Normal,True,No |
| 11 | Overcast,Cool,Normal,True,Yes |
| 12 | Sunny,Mild,High,False,No |
| 13 | Sunny,Cool,Normal,False,Yes |
| 14 | Rainy,Mild,Normal,False,Yes |
| 15 | Sunny,Mild,Normal,True,Yes |
| 16 | Overcast,Mild,High,True,Yes |
| 17 | Overcast,Hot,Normal,False,Yes |
| 18 | Rainy,Mild,High,True,No |
| 19 | ; |
| 20 | RUN; |
This example trains a simple Naive Bayes classifier on the 'golf' dataset to predict the 'Play' variable. It uses all other variables as inputs.
| 1 | PROC CAS; |
| 2 | bayesianNetClassifier.bnet |
| 3 | TABLE={name='golf'}, |
| 4 | target='Play', |
| 5 | inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'}, |
| 6 | structures={'NAIVE'}, |
| 7 | OUTPUT={casout={name='golf_scored_simple', replace=true}, copyVars={'Play'}}, |
| 8 | saveState={name='bnet_model_simple', replace=true}; |
| 9 | RUN; |
This example demonstrates a more advanced use case. It partitions the data into training (70%) and validation (30%) sets, then trains a Tree-Augmented Naive (TAN) network. It specifies how to handle missing values and sets a maximum of 2 parents for any node.
| 1 | PROC CAS; |
| 2 | bayesianNetClassifier.bnet |
| 3 | TABLE={name='golf'}, |
| 4 | target='Play', |
| 5 | inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'}, |
| 6 | nominals={'Outlook', 'Temperature', 'Humidity', 'Windy', 'Play'}, |
| 7 | partByFrac={train=0.7, validate=0.3, seed=1234}, |
| 8 | structures={'TAN'}, |
| 9 | maxParents=2, |
| 10 | missingNom='LEVEL', |
| 11 | OUTPUT={casout={name='golf_scored_detailed', replace=true}, copyVars={'Play'}}, |
| 12 | outNetwork={name='bnet_network_detailed', replace=true}, |
| 13 | saveState={name='bnet_model_detailed', replace=true}; |
| 14 | RUN; |
A telecommunications company wants to proactively identify customers at high risk of churning. By modeling customer behavior and contract attributes, they aim to build a predict...
A financial institution needs to build a robust fraud detection system. The dataset contains numerous transaction attributes, many of which might be irrelevant. The goal is to t...
An industrial manufacturing plant wants to predict equipment failure using sensor data. However, due to network issues and sensor malfunctions, the data is often incomplete. Thi...