percentile assess

Performance Case: Assessing a Fraud Model on Partitioned Data

Scénario de test & Cas d'usage

Business Context

A financial services company needs to evaluate a fraud detection model across millions of transactions. To speed up the process and get segment-specific insights, they want to assess the model's performance grouped by transaction type (e.g., 'ONLINE', 'IN-STORE'). The dataset is partitioned by transaction date for efficiency.
About the Set : percentile

Precise calculation of percentiles and quantiles.

Discover all actions of percentile
Data Preparation

Create a large, partitioned dataset simulating financial transactions. The table is partitioned by 'TXN_DATE' and includes a 'TXN_TYPE' for grouping. 'IS_FRAUD' is the target, and 'P_FRAUD' is the model score.

Copied!
1DATA casuser.transactions (partition=(TXN_DATE));
2 FORMAT TXN_DATE yymmdd10.;
3 call streaminit(456);
4 array txn_types[2] $8 ('ONLINE', 'IN-STORE');
5 DO TXN_DATE = '01NOV2025'd to '05NOV2025'd;
6 DO i = 1 to 200000; /* 1 million total records */
7 TXN_ID = uuidgen();
8 TXN_TYPE = txn_types[rand('integer', 1, 2)];
9 IF TXN_TYPE = 'ONLINE' THEN base_prob = 0.05; ELSE base_prob = 0.01;
10 P_FRAUD = base_prob + rand('uniform')*0.1;
11 IS_FRAUD = rand('binomial', P_FRAUD, 1);
12 OUTPUT;
13 END;
14 END;
15RUN;

Étapes de réalisation

1
Run the assess action using the partitioned table. We specify `partition=true` for efficiency and use `groupBy` on 'TXN_TYPE' to get separate assessments for online and in-store transactions. We use the default 'ITERATIVE' method suitable for large data.
Copied!
1PROC CAS;
2 percentile.assess
3 TABLE={name='transactions', caslib='casuser', groupBy={'TXN_TYPE'}},
4 response='IS_FRAUD',
5 inputs={{name='P_FRAUD'}},
6 event='1',
7 partition=true,
8 includeFitStat=true,
9 fitStatOut={name='fraud_fitstat_by_type', caslib='casuser', replace=true};
10QUIT;
2
Verify the output table contains separate statistics for each group.
Copied!
1 
2PROC CAS;
3TABLE.fetch / TABLE={name='fraud_fitstat_by_type', caslib='casuser'};
4QUIT;
5 

Expected Result


The action should leverage the table's partitioning for efficient processing. The primary output, 'fraud_fitstat_by_type', must contain assessment statistics (like AUC and KS) for each level of 'TXN_TYPE'. The fetch result should show two rows in the fit statistics table, one for 'ONLINE' and one for 'IN-STORE', each with its own set of metrics.