Scénario de test & Cas d'usage
Simulation of a larger dataset (100,000 transactions) representing credit card activity.
| 1 | |
| 2 | DATA casuser.fraud_transactions; |
| 3 | call streaminit(777); |
| 4 | DO i = 1 to 100000; |
| 5 | amount = rand('exponential', 50); |
| 6 | merchant_cat = rand('integer', 1, 20); |
| 7 | time_since_last = rand('uniform', 0, 60); |
| 8 | IF rand('uniform') < 0.02 THEN is_fraud = 1; |
| 9 | ELSE is_fraud = 0; |
| 10 | OUTPUT; |
| 11 | END; |
| 12 | |
| 13 | RUN; |
| 14 |
| 1 | |
| 2 | PROC CAS; |
| 3 | |
| 4 | dataStep.runCode / code=" |
| 5 | data casuser.fraud_transactions; |
| 6 | set casuser.fraud_transactions; |
| 7 | "; |
| 8 | |
| 9 | QUIT; |
| 10 |
| 1 | PROC CAS; |
| 2 | mlTools.crossValidate / |
| 3 | TABLE={name="fraud_transactions"} |
| 4 | modelType="FOREST" |
| 5 | kFolds=10 |
| 6 | parallelFolds=TRUE |
| 7 | nSubsessionWorkers=4 |
| 8 | casOut={name="cv_fraud_scored", replace=TRUE} |
| 9 | trainOptions={ |
| 10 | target="is_fraud", |
| 11 | inputs={"amount", "merchant_cat", "time_since_last"}, |
| 12 | nominals={"is_fraud", "merchant_cat"} |
| 13 | }; |
| 14 | QUIT; |
The system distributes the 10 folds across available worker nodes. The execution time is reduced compared to serial execution. The output table contains scored data for all 100,000 records, compiled from the validation hold-out sets of each fold.