bnet - WeAreCAS

Q: What is the primary purpose of the bnet action?

The bnet action, part of the Bayesian Net Classifier Action Set, uses Bayesian network models to classify a target variable.

Q: What are the different network structures that can be learned with the bnet action?

The bnet action can learn several network structures, including GENERAL (or GN), Markov Blanket (MB), NAIVE, Parent-Child (PC), and Tree-Augmented Naive (TAN) Bayesian networks.

Q: How does the bnet action handle missing values in interval variables?

For interval variables, missing values can be handled in two ways: they can be ignored, or they can be imputed with the mean of the variable. This is controlled by the 'missingInt' parameter.

Q: What options are available for handling missing values in nominal variables?

For nominal variables, the bnet action provides three options for handling missing values: ignoring the observations, imputing with the mode of the variable, or treating the missing values as a distinct level. This is controlled by the 'missingNom' parameter.

Q: What methods are available for independence tests in the bnet action?

The bnet action supports several methods for independence tests, specified by the 'indepTest' parameter: CHISQUARE (chi-square statistic), GSQUARE (G-square statistic), MI (normalized mutual information), CHIGSQUARE (both chi-square and G-square), and ALL (chi-square, G-square, and mutual information).

Q: What is the function of the 'maxParents' parameter?

The 'maxParents' parameter specifies the maximum number of parents for each node in the network. Its default value is 5, and it can range from 1 to 16.

Q: How can the model be saved for future scoring?

The 'saveState' parameter can be used to specify an output table where the trained model is saved for future scoring purposes.

Description

The bnet action from the Bayesian Net Classifier action set uses Bayesian network models to classify a target variable. It allows for various network structures and variable selection methods to build a predictive model.

bayesianNetClassifier.bnet result=results status=rc / alpha={double-1, double-2, ...}, attributes={{name='variable-name', format='string', formattedLength=integer, label='string', nfd=integer, nfl=integer}, ...}, bestModel=TRUE | FALSE, code={casOut={...}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, intoCutPt=double, iProb=TRUE | FALSE, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, pCatAll=TRUE | FALSE, tabForm=TRUE | FALSE}, codeGroup='string', display={...}, freq='string', id={'variable-name-1', 'variable-name-2', ...}, indepTest='ALL' | 'CHIGSQUARE' | 'CHISQUARE' | 'GSQUARE' | 'MI', inNetwork={...}, inputs={{...}, ...}, maxParents=integer, miAlpha=double, missingInt='IGNORE' | 'IMPUTE', missingNom='IGNORE' | 'IMPUTE' | 'LEVEL', nominals={{...}, ...}, numBin=integer, outNetwork={...}, output={...}, outputTables={...}, parenting={'BESTONE', 'BESTSET'}, partByFrac={...}, partByVar={...}, preScreening={'ONE', 'ZERO'}, printtarget=TRUE | FALSE, resident=TRUE | FALSE, saveState={...}, structures={'GENERAL', 'GN', 'MB', 'NAIVE', 'PC', 'TAN'}, table={...}, target='string', varSelect={'ONE', 'THREE', 'TWO', 'ZERO'} ;

Settings

Parameter	Description
alpha	Specifies the significance level for independence tests using chi-square or G-square statistics. You can specify up to five values to find the best model.
attributes	Changes the attributes of variables used in the action.
bestModel	When set to True, selects the best model based on validation data or cross-validation.
code	Specifies the settings for generating SAS DATA step scoring code.
codeGroup	Specifies a group for the generated code.
display	Specifies a list of results tables to be displayed.
freq	Specifies the frequency variable for the analysis.
id	Specifies variables to be copied to the output table.
indepTest	Specifies the method for independence tests (e.g., CHISQUARE, GSQUARE, MI).
inNetwork	Specifies the input table that defines links to be included or excluded from the network structure.
inputs	Specifies the input variables for the analysis.
maxParents	Specifies the maximum number of parents for each node in the network.
miAlpha	Specifies the significance level for independence tests that use mutual information.
missingInt	Specifies how to handle missing values for interval variables (IGNORE or IMPUTE).
missingNom	Specifies how to handle missing values for nominal variables (IGNORE, IMPUTE, or LEVEL).
nominals	Specifies the nominal variables to be used in the analysis.
numBin	Specifies the number of bins to use for interval variables.
outNetwork	Specifies the output table for the network structure and probability distributions.
output	Specifies the output table to store predicted values.
outputTables	Lists the names of results tables to save as CAS tables.
parenting	Specifies the structure learning method (BESTONE or BESTSET).
partByFrac	Partitions the input data by specifying fractions for training, testing, and validation.
partByVar	Partitions the input data based on the values of a specified variable.
preScreening	Specifies the initial screening method for input variables (ONE or ZERO).
printtarget	When set to True, generates names for the predicted target and probability variables.
resident	Specifies whether the model should be kept in memory.
saveState	Specifies the table in which to save the model state for future scoring.
structures	Specifies the network structure types to be learned (e.g., NAIVE, TAN, PC).
table	Specifies the input data table for the analysis.
target	Specifies the target variable for classification.
varSelect	Specifies the variable selection method beyond prescreening (ZERO, ONE, TWO, THREE).

Data Preparation View data prep sheet

Data Creation

This example creates a simple dataset named 'golf' with weather conditions and a decision on whether to play golf. This dataset will be used to train a Bayesian network classifier.

Copied!

1	DATA casuser.golf;
2	INFILE DATALINES delimiter=',';
3	INPUT Outlook $ Temperature $ Humidity $ Windy $ Play $;
4	DATALINES;
5	Sunny,Hot,High,False,No
6	Sunny,Hot,High,True,No
7	Overcast,Hot,High,False,Yes
8	Rainy,Mild,High,False,Yes
9	Rainy,Cool,Normal,False,Yes
10	Rainy,Cool,Normal,True,No
11	Overcast,Cool,Normal,True,Yes
12	Sunny,Mild,High,False,No
13	Sunny,Cool,Normal,False,Yes
14	Rainy,Mild,Normal,False,Yes
15	Sunny,Mild,Normal,True,Yes
16	Overcast,Mild,High,True,Yes
17	Overcast,Hot,Normal,False,Yes
18	Rainy,Mild,High,True,No
19	;
20	RUN;

Examples

This example trains a simple Naive Bayes classifier on the 'golf' dataset to predict the 'Play' variable. It uses all other variables as inputs.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	bayesianNetClassifier.bnet
3	TABLE={name='golf'},
4	target='Play',
5	inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6	structures={'NAIVE'},
7	OUTPUT={casout={name='golf_scored_simple', replace=true}, copyVars={'Play'}},
8	saveState={name='bnet_model_simple', replace=true};
9	RUN;

Result :
The action trains a Naive Bayes model and creates two tables: 'golf_scored_simple' with predictions and 'bnet_model_simple' containing the model state for future scoring.

This example demonstrates a more advanced use case. It partitions the data into training (70%) and validation (30%) sets, then trains a Tree-Augmented Naive (TAN) network. It specifies how to handle missing values and sets a maximum of 2 parents for any node.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	bayesianNetClassifier.bnet
3	TABLE={name='golf'},
4	target='Play',
5	inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6	nominals={'Outlook', 'Temperature', 'Humidity', 'Windy', 'Play'},
7	partByFrac={train=0.7, validate=0.3, seed=1234},
8	structures={'TAN'},
9	maxParents=2,
10	missingNom='LEVEL',
11	OUTPUT={casout={name='golf_scored_detailed', replace=true}, copyVars={'Play'}},
12	outNetwork={name='bnet_network_detailed', replace=true},
13	saveState={name='bnet_model_detailed', replace=true};
14	RUN;

Result :
This trains a TAN model using 70% of the data and validates on the remaining 30%. It creates 'golf_scored_detailed' with predictions, 'bnet_network_detailed' with the network structure, and 'bnet_model_detailed' to save the model.

FAQ

What is the primary purpose of the bnet action?

What are the different network structures that can be learned with the bnet action?

How does the bnet action handle missing values in interval variables?

What options are available for handling missing values in nominal variables?

What methods are available for independence tests in the bnet action?

What is the function of the 'maxParents' parameter?

How can the model be saved for future scoring?

Associated Scenarios

Use Case

Standard Case: Telecom Customer Churn Prediction with a TAN Model

A telecommunications company wants to proactively identify customers at high risk of churning. By modeling customer behavior and contract attributes, they aim to build a predict...

View scenario

Use Case

Performance & Feature Selection: Fraud Detection with a General Network

A financial institution needs to build a robust fraud detection system. The dataset contains numerous transaction attributes, many of which might be irrelevant. The goal is to t...

View scenario

Use Case

Edge Case & Robustness: Predictive Maintenance with Missing Sensor Data

An industrial manufacturing plant wants to predict equipment failure using sensor data. However, due to network issues and sensor malfunctions, the data is often incomplete. Thi...

View scenario

Table of Contents

Description

Data Creation

Examples

Simple Naive Bayes Model

Detailed Tree-Augmented Naive (TAN) Model with Data Partitioning

FAQ

Associated Scenarios

Use Case

Standard Case: Telecom Customer Churn Prediction with a TAN Model

Use Case

Performance & Feature Selection: Fraud Detection with a General Network

Use Case

Edge Case & Robustness: Predictive Maintenance with Missing Sensor Data