dataPreprocess binning

Customer Segmentation for a Targeted Marketing Campaign

Scénario de test & Cas d'usage

Business Context

A retail company wants to segment its customer base to launch a new loyalty program. The marketing team needs to group customers into five distinct, equally-sized groups based on their 'age' and 'annual_spending' to tailor promotional offers effectively.
About the Set : dataPreprocess

Data cleaning, imputation, and preprocessing.

Discover all actions of dataPreprocess
Data Preparation

Creation of a customer dataset with age and spending habits. The data is uniformly distributed to be suitable for quantile binning.

Copied!
1DATA mycas.customer_profiles;
2 call streaminit(123);
3 DO customer_id = 1 to 5000;
4 age = 18 + floor(rand('UNIFORM') * 60);
5 annual_spending = 500 + rand('UNIFORM') * 10000;
6 OUTPUT;
7 END;
8RUN;

Étapes de réalisation

1
Load the customer profiles table into the CAS server. This step is implicitly handled by the DATA step creating the table in the `mycas` caslib.
Copied!
1/*
2Data is already in mycas.customer_profiles from the data_prep step */
2
Execute quantile binning on 'age' and 'annual_spending' to create 5 equal-frequency bins for each. Store the results and the bin details in separate output tables.
Copied!
1PROC CAS;
2 dataPreprocess.binning /
3 TABLE={name='customer_profiles'},
4 inputs={{name='age'}, {name='annual_spending'}},
5 method='QUANTILE',
6 nBinsArray=5,
7 includeInputVars=true,
8 outVarsNameSuffix='_quantile_group',
9 casOut={name='customer_segments', replace=true},
10 casOutBinDetails={name='segment_details', replace=true};
11RUN;
12QUIT;
3
Verify the distribution of customers across the new bins to ensure they are roughly equal in size, as expected from the QUANTILE method.
Copied!
1PROC CAS;
2 SIMPLE.freq /
3 TABLE={name='customer_segments'}
4 inputs={'age_quantile_group', 'annual_spending_quantile_group'};
5RUN;
6QUIT;

Expected Result


The action creates two tables: 'customer_segments' and 'segment_details'. The 'customer_segments' table contains the original data plus two new columns, 'age_quantile_group' and 'annual_spending_quantile_group', with integer values from 1 to 5. The frequency analysis should show that each bin for both variables contains approximately 1000 customers.