Scénario de test & Cas d'usage
Creates a large dataset 'customer_value_large' (simulated with 50,000 records) with a continuous target 'future_value' and several numeric and categorical predictors. A partitioning variable 'data_role' is included.
| 1 | DATA customer_value_large; |
| 2 | call streaminit(567); |
| 3 | DO customer_id = 1 to 50000; |
| 4 | age = 20 + floor(rand('UNIFORM') * 50); |
| 5 | income = 30000 + rand('UNIFORM') * 150000; |
| 6 | months_active = 1 + floor(rand('UNIFORM') * 120); |
| 7 | product_count = 1 + floor(rand('UNIFORM') * 5); |
| 8 | region = byte(65 + floor(rand('UNIFORM') * 5)); /* A-E */ |
| 9 | future_value = 500 + (income / 100) + (months_active * 10) * (product_count) - (age * 5) + rand('NORMAL', 0, 200); |
| 10 | IF rand('UNIFORM') < 0.7 THEN data_role = 'TRAIN'; |
| 11 | ELSE data_role = 'TEST'; |
| 12 | OUTPUT; |
| 13 | END; |
| 14 | RUN; |
| 1 | PROC CASUTIL; |
| 2 | load DATA=customer_value_large casout='customer_value_large' replace; |
| 3 | RUN; |
| 4 | QUIT; |
| 1 | PROC CAS; |
| 2 | LOADACTIONSET 'bart'; |
| 3 | bart.bartGauss / |
| 4 | TABLE={name='customer_value_large'}, |
| 5 | target='future_value', |
| 6 | inputs={'age', 'income', 'months_active', 'product_count', 'region'}, |
| 7 | nominals={'region'}, |
| 8 | partByVar={name='data_role', train='TRAIN', test='TEST'}, |
| 9 | nTree=100, |
| 10 | nBins=50, |
| 11 | quantileBin=true, |
| 12 | maxTrainTime=2700, /* 45 minutes */ |
| 13 | seed=2025, |
| 14 | outputTables={names={'ModelInfo', 'FitStatistics'}}; |
| 15 | RUN; |
| 16 | QUIT; |
The action must complete the training process in less than 2700 seconds. The 'FitStatistics' table should be generated, showing model performance metrics (like ASE) for both the training and testing partitions. The use of 'nBins' and 'quantileBin' should allow the action to handle the large volume of data efficiently. The log should confirm that the run terminated naturally before the time limit was hit.