bayesianNetClassifier bnet

Performance & Feature Selection: Fraud Detection with a General Network

Scénario de test & Cas d'usage

Business Context

A financial institution needs to build a robust fraud detection system. The dataset contains numerous transaction attributes, many of which might be irrelevant. The goal is to test the action's ability to automatically select predictive features and learn a complex dependency structure from a larger dataset.
About the Set : bayesianNetClassifier

Classification using Bayesian networks.

Discover all actions of bayesianNetClassifier
Data Preparation

Creation of a larger, simulated transaction dataset (2000 rows) with multiple potential predictors. The data includes transaction details and user context. The target is 'IsFraud'.

Copied!
1DATA casuser.transactions_fraud;
2 LENGTH Merchant_Cat $20. Device $10. IsFraud $3.;
3 call streaminit(4321);
4 DO TransactionID = 1 to 2000;
5 Amount = round(rand('Uniform') * 1000, 0.01);
6 Hour = rand('Integer', 0, 23);
7 Prev_Declines = rand('Integer', 0, 5);
8 IF rand('Uniform') < 0.7 THEN Merchant_Cat = 'Online Retail'; ELSE Merchant_Cat = 'Travel';
9 IF rand('Uniform') < 0.6 THEN Device = 'Mobile'; ELSE Device = 'Desktop';
10 /* Introduce correlation for fraud */
11 IF (Amount > 900 and Hour < 6) or Prev_Declines > 3 THEN IsFraud = 'Yes';
12 ELSE IF rand('Uniform') < 0.05 THEN IsFraud = 'Yes';
13 ELSE IsFraud = 'No';
14 OUTPUT;
15 END;
16RUN;

Étapes de réalisation

1
Train a General Bayesian Network, allowing the action to discover a complex structure. Enable variable pre-screening and selection to identify the most relevant predictors. Limit the complexity by setting maxParents to 3.
Copied!
1PROC CAS;
2 bayesianNetClassifier.bnet
3 TABLE={name='transactions_fraud'}
4 target='IsFraud'
5 inputs={'Amount', 'Hour', 'Prev_Declines', 'Merchant_Cat', 'Device'}
6 nominals={'Merchant_Cat', 'Device', 'IsFraud'}
7 structures={'GENERAL'}
8 parenting='BESTSET'
9 preScreening='ONE'
10 varSelect='ONE'
11 maxParents=3
12 outNetwork={name='fraud_network', replace=true}
13 display={'NetInfo', 'VarSelectInfo'};
14RUN;
2
Analyze the generated network structure to see which variables were selected as parents for the 'IsFraud' target node.
Copied!
1 
2PROC CAS;
3TABLE.fetch / TABLE={name='fraud_network', where="dest='IsFraud'"};
4RUN;
5 

Expected Result


The action should complete successfully on the larger dataset. The 'VarSelectInfo' table in the results should indicate which variables were kept. The 'fraud_network' output table should be created, and querying it should reveal the learned parent-child relationships. We expect 'Amount', 'Hour', and 'Prev_Declines' to be selected as direct parents of 'IsFraud', demonstrating the effectiveness of the variable selection and structure learning.