bayesianNetClassifier

bnet

Description

The bnet action from the Bayesian Net Classifier action set uses Bayesian network models to classify a target variable. It allows for various network structures and variable selection methods to build a predictive model.

bayesianNetClassifier.bnet result=results status=rc / alpha={double-1, double-2, ...}, attributes={{name='variable-name', format='string', formattedLength=integer, label='string', nfd=integer, nfl=integer}, ...}, bestModel=TRUE | FALSE, code={casOut={...}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, intoCutPt=double, iProb=TRUE | FALSE, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, pCatAll=TRUE | FALSE, tabForm=TRUE | FALSE}, codeGroup='string', display={...}, freq='string', id={'variable-name-1', 'variable-name-2', ...}, indepTest='ALL' | 'CHIGSQUARE' | 'CHISQUARE' | 'GSQUARE' | 'MI', inNetwork={...}, inputs={{...}, ...}, maxParents=integer, miAlpha=double, missingInt='IGNORE' | 'IMPUTE', missingNom='IGNORE' | 'IMPUTE' | 'LEVEL', nominals={{...}, ...}, numBin=integer, outNetwork={...}, output={...}, outputTables={...}, parenting={'BESTONE', 'BESTSET'}, partByFrac={...}, partByVar={...}, preScreening={'ONE', 'ZERO'}, printtarget=TRUE | FALSE, resident=TRUE | FALSE, saveState={...}, structures={'GENERAL', 'GN', 'MB', 'NAIVE', 'PC', 'TAN'}, table={...}, target='string', varSelect={'ONE', 'THREE', 'TWO', 'ZERO'} ;
Settings
ParameterDescription
alpha Specifies the significance level for independence tests using chi-square or G-square statistics. You can specify up to five values to find the best model.
attributes Changes the attributes of variables used in the action.
bestModel When set to True, selects the best model based on validation data or cross-validation.
code Specifies the settings for generating SAS DATA step scoring code.
codeGroup Specifies a group for the generated code.
display Specifies a list of results tables to be displayed.
freq Specifies the frequency variable for the analysis.
id Specifies variables to be copied to the output table.
indepTest Specifies the method for independence tests (e.g., CHISQUARE, GSQUARE, MI).
inNetwork Specifies the input table that defines links to be included or excluded from the network structure.
inputs Specifies the input variables for the analysis.
maxParents Specifies the maximum number of parents for each node in the network.
miAlpha Specifies the significance level for independence tests that use mutual information.
missingInt Specifies how to handle missing values for interval variables (IGNORE or IMPUTE).
missingNom Specifies how to handle missing values for nominal variables (IGNORE, IMPUTE, or LEVEL).
nominals Specifies the nominal variables to be used in the analysis.
numBin Specifies the number of bins to use for interval variables.
outNetwork Specifies the output table for the network structure and probability distributions.
output Specifies the output table to store predicted values.
outputTables Lists the names of results tables to save as CAS tables.
parenting Specifies the structure learning method (BESTONE or BESTSET).
partByFrac Partitions the input data by specifying fractions for training, testing, and validation.
partByVar Partitions the input data based on the values of a specified variable.
preScreening Specifies the initial screening method for input variables (ONE or ZERO).
printtarget When set to True, generates names for the predicted target and probability variables.
resident Specifies whether the model should be kept in memory.
saveState Specifies the table in which to save the model state for future scoring.
structures Specifies the network structure types to be learned (e.g., NAIVE, TAN, PC).
table Specifies the input data table for the analysis.
target Specifies the target variable for classification.
varSelect Specifies the variable selection method beyond prescreening (ZERO, ONE, TWO, THREE).
Data Preparation View data prep sheet
Data Creation

This example creates a simple dataset named 'golf' with weather conditions and a decision on whether to play golf. This dataset will be used to train a Bayesian network classifier.

Copied!
1DATA casuser.golf;
2 INFILE DATALINES delimiter=',';
3 INPUT Outlook $ Temperature $ Humidity $ Windy $ Play $;
4 DATALINES;
5 Sunny,Hot,High,False,No
6 Sunny,Hot,High,True,No
7 Overcast,Hot,High,False,Yes
8 Rainy,Mild,High,False,Yes
9 Rainy,Cool,Normal,False,Yes
10 Rainy,Cool,Normal,True,No
11 Overcast,Cool,Normal,True,Yes
12 Sunny,Mild,High,False,No
13 Sunny,Cool,Normal,False,Yes
14 Rainy,Mild,Normal,False,Yes
15 Sunny,Mild,Normal,True,Yes
16 Overcast,Mild,High,True,Yes
17 Overcast,Hot,Normal,False,Yes
18 Rainy,Mild,High,True,No
19 ;
20RUN;

Examples

This example trains a simple Naive Bayes classifier on the 'golf' dataset to predict the 'Play' variable. It uses all other variables as inputs.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 bayesianNetClassifier.bnet
3 TABLE={name='golf'},
4 target='Play',
5 inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6 structures={'NAIVE'},
7 OUTPUT={casout={name='golf_scored_simple', replace=true}, copyVars={'Play'}},
8 saveState={name='bnet_model_simple', replace=true};
9RUN;
Result :
The action trains a Naive Bayes model and creates two tables: 'golf_scored_simple' with predictions and 'bnet_model_simple' containing the model state for future scoring.

This example demonstrates a more advanced use case. It partitions the data into training (70%) and validation (30%) sets, then trains a Tree-Augmented Naive (TAN) network. It specifies how to handle missing values and sets a maximum of 2 parents for any node.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 bayesianNetClassifier.bnet
3 TABLE={name='golf'},
4 target='Play',
5 inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6 nominals={'Outlook', 'Temperature', 'Humidity', 'Windy', 'Play'},
7 partByFrac={train=0.7, validate=0.3, seed=1234},
8 structures={'TAN'},
9 maxParents=2,
10 missingNom='LEVEL',
11 OUTPUT={casout={name='golf_scored_detailed', replace=true}, copyVars={'Play'}},
12 outNetwork={name='bnet_network_detailed', replace=true},
13 saveState={name='bnet_model_detailed', replace=true};
14RUN;
Result :
This trains a TAN model using 70% of the data and validates on the remaining 30%. It creates 'golf_scored_detailed' with predictions, 'bnet_network_detailed' with the network structure, and 'bnet_model_detailed' to save the model.

FAQ

What is the primary purpose of the bnet action?
What are the different network structures that can be learned with the bnet action?
How does the bnet action handle missing values in interval variables?
What options are available for handling missing values in nominal variables?
What methods are available for independence tests in the bnet action?
What is the function of the 'maxParents' parameter?
How can the model be saved for future scoring?

Associated Scenarios

Use Case
Standard Case: Telecom Customer Churn Prediction with a TAN Model

A telecommunications company wants to proactively identify customers at high risk of churning. By modeling customer behavior and contract attributes, they aim to build a predict...

Use Case
Performance & Feature Selection: Fraud Detection with a General Network

A financial institution needs to build a robust fraud detection system. The dataset contains numerous transaction attributes, many of which might be irrelevant. The goal is to t...

Use Case
Edge Case & Robustness: Predictive Maintenance with Missing Sensor Data

An industrial manufacturing plant wants to predict equipment failure using sensor data. However, due to network issues and sensor malfunctions, the data is often incomplete. Thi...