Performance Case: Assessing a Fraud Model on Partitioned Data

Business Context

A financial services company needs to evaluate a fraud detection model across millions of transactions. To speed up the process and get segment-specific insights, they want to assess the model's performance grouped by transaction type (e.g., 'ONLINE', 'IN-STORE'). The dataset is partitioned by transaction date for efficiency.

About the Set : percentile

Precise calculation of percentiles and quantiles.

Discover all actions of percentile

Data Preparation

Create a large, partitioned dataset simulating financial transactions. The table is partitioned by 'TXN_DATE' and includes a 'TXN_TYPE' for grouping. 'IS_FRAUD' is the target, and 'P_FRAUD' is the model score.

Copied!

1	DATA casuser.transactions (partition=(TXN_DATE));
2	FORMAT TXN_DATE yymmdd10.;
3	call streaminit(456);
4	array txn_types[2] $8 ('ONLINE', 'IN-STORE');
5	DO TXN_DATE = '01NOV2025'd to '05NOV2025'd;
6	DO i = 1 to 200000; /* 1 million total records */
7	TXN_ID = uuidgen();
8	TXN_TYPE = txn_types[rand('integer', 1, 2)];
9	IF TXN_TYPE = 'ONLINE' THEN base_prob = 0.05; ELSE base_prob = 0.01;
10	P_FRAUD = base_prob + rand('uniform')*0.1;
11	IS_FRAUD = rand('binomial', P_FRAUD, 1);
12	OUTPUT;
13	END;
14	END;
15	RUN;

Étapes de réalisation

Run the assess action using the partitioned table. We specify `partition=true` for efficiency and use `groupBy` on 'TXN_TYPE' to get separate assessments for online and in-store transactions. We use the default 'ITERATIVE' method suitable for large data.

Copied!

1	PROC CAS;
2	percentile.assess
3	TABLE={name='transactions', caslib='casuser', groupBy={'TXN_TYPE'}},
4	response='IS_FRAUD',
5	inputs={{name='P_FRAUD'}},
6	event='1',
7	partition=true,
8	includeFitStat=true,
9	fitStatOut={name='fraud_fitstat_by_type', caslib='casuser', replace=true};
10	QUIT;

Verify the output table contains separate statistics for each group.

Copied!

1
2	PROC CAS;
3	TABLE.fetch / TABLE={name='fraud_fitstat_by_type', caslib='casuser'};
4	QUIT;
5

Expected Result

The action should leverage the table's partitioning for efficient processing. The primary output, 'fraud_fitstat_by_type', must contain assessment statistics (like AUC and KS) for each level of 'TXN_TYPE'. The fetch result should show two rows in the fit statistics table, one for 'ONLINE' and one for 'IN-STORE', each with its own set of metrics.

Voir la documentation technique de assess