Scénario de test & Cas d'usage
Data cleaning, imputation, and preprocessing.
Discover all actions of dataPreprocessCreation of a customer dataset with age and spending habits. The data is uniformly distributed to be suitable for quantile binning.
| 1 | DATA mycas.customer_profiles; |
| 2 | call streaminit(123); |
| 3 | DO customer_id = 1 to 5000; |
| 4 | age = 18 + floor(rand('UNIFORM') * 60); |
| 5 | annual_spending = 500 + rand('UNIFORM') * 10000; |
| 6 | OUTPUT; |
| 7 | END; |
| 8 | RUN; |
| 1 | /* |
| 2 | Data is already in mycas.customer_profiles from the data_prep step */ |
| 1 | PROC CAS; |
| 2 | dataPreprocess.binning / |
| 3 | TABLE={name='customer_profiles'}, |
| 4 | inputs={{name='age'}, {name='annual_spending'}}, |
| 5 | method='QUANTILE', |
| 6 | nBinsArray=5, |
| 7 | includeInputVars=true, |
| 8 | outVarsNameSuffix='_quantile_group', |
| 9 | casOut={name='customer_segments', replace=true}, |
| 10 | casOutBinDetails={name='segment_details', replace=true}; |
| 11 | RUN; |
| 12 | QUIT; |
| 1 | PROC CAS; |
| 2 | SIMPLE.freq / |
| 3 | TABLE={name='customer_segments'} |
| 4 | inputs={'age_quantile_group', 'annual_spending_quantile_group'}; |
| 5 | RUN; |
| 6 | QUIT; |
The action creates two tables: 'customer_segments' and 'segment_details'. The 'customer_segments' table contains the original data plus two new columns, 'age_quantile_group' and 'annual_spending_quantile_group', with integer values from 1 to 5. The frequency analysis should show that each bin for both variables contains approximately 1000 customers.