Scénario de test & Cas d'usage
Classification using Bayesian networks.
Discover all actions of bayesianNetClassifierCreation of a larger, simulated transaction dataset (2000 rows) with multiple potential predictors. The data includes transaction details and user context. The target is 'IsFraud'.
| 1 | DATA casuser.transactions_fraud; |
| 2 | LENGTH Merchant_Cat $20. Device $10. IsFraud $3.; |
| 3 | call streaminit(4321); |
| 4 | DO TransactionID = 1 to 2000; |
| 5 | Amount = round(rand('Uniform') * 1000, 0.01); |
| 6 | Hour = rand('Integer', 0, 23); |
| 7 | Prev_Declines = rand('Integer', 0, 5); |
| 8 | IF rand('Uniform') < 0.7 THEN Merchant_Cat = 'Online Retail'; ELSE Merchant_Cat = 'Travel'; |
| 9 | IF rand('Uniform') < 0.6 THEN Device = 'Mobile'; ELSE Device = 'Desktop'; |
| 10 | /* Introduce correlation for fraud */ |
| 11 | IF (Amount > 900 and Hour < 6) or Prev_Declines > 3 THEN IsFraud = 'Yes'; |
| 12 | ELSE IF rand('Uniform') < 0.05 THEN IsFraud = 'Yes'; |
| 13 | ELSE IsFraud = 'No'; |
| 14 | OUTPUT; |
| 15 | END; |
| 16 | RUN; |
| 1 | PROC CAS; |
| 2 | bayesianNetClassifier.bnet |
| 3 | TABLE={name='transactions_fraud'} |
| 4 | target='IsFraud' |
| 5 | inputs={'Amount', 'Hour', 'Prev_Declines', 'Merchant_Cat', 'Device'} |
| 6 | nominals={'Merchant_Cat', 'Device', 'IsFraud'} |
| 7 | structures={'GENERAL'} |
| 8 | parenting='BESTSET' |
| 9 | preScreening='ONE' |
| 10 | varSelect='ONE' |
| 11 | maxParents=3 |
| 12 | outNetwork={name='fraud_network', replace=true} |
| 13 | display={'NetInfo', 'VarSelectInfo'}; |
| 14 | RUN; |
| 1 | |
| 2 | PROC CAS; |
| 3 | TABLE.fetch / TABLE={name='fraud_network', where="dest='IsFraud'"}; |
| 4 | RUN; |
| 5 |
The action should complete successfully on the larger dataset. The 'VarSelectInfo' table in the results should indicate which variables were kept. The 'fraud_network' output table should be created, and querying it should reveal the learned parent-child relationships. We expect 'Amount', 'Hour', and 'Prev_Declines' to be selected as direct parents of 'IsFraud', demonstrating the effectiveness of the variable selection and structure learning.