Customer Segmentation for a Targeted Marketing Campaign

Business Context

A retail company wants to segment its customer base to launch a new loyalty program. The marketing team needs to group customers into five distinct, equally-sized groups based on their 'age' and 'annual_spending' to tailor promotional offers effectively.

About the Set : dataPreprocess

Data cleaning, imputation, and preprocessing.

Discover all actions of dataPreprocess

Data Preparation

Creation of a customer dataset with age and spending habits. The data is uniformly distributed to be suitable for quantile binning.

Copied!

1	DATA mycas.customer_profiles;
2	call streaminit(123);
3	DO customer_id = 1 to 5000;
4	age = 18 + floor(rand('UNIFORM') * 60);
5	annual_spending = 500 + rand('UNIFORM') * 10000;
6	OUTPUT;
7	END;
8	RUN;

Étapes de réalisation

Load the customer profiles table into the CAS server. This step is implicitly handled by the DATA step creating the table in the `mycas` caslib.

Copied!

1	/*
2	Data is already in mycas.customer_profiles from the data_prep step */

Execute quantile binning on 'age' and 'annual_spending' to create 5 equal-frequency bins for each. Store the results and the bin details in separate output tables.

Copied!

1	PROC CAS;
2	dataPreprocess.binning /
3	TABLE={name='customer_profiles'},
4	inputs={{name='age'}, {name='annual_spending'}},
5	method='QUANTILE',
6	nBinsArray=5,
7	includeInputVars=true,
8	outVarsNameSuffix='_quantile_group',
9	casOut={name='customer_segments', replace=true},
10	casOutBinDetails={name='segment_details', replace=true};
11	RUN;
12	QUIT;

Verify the distribution of customers across the new bins to ensure they are roughly equal in size, as expected from the QUANTILE method.

Copied!

1	PROC CAS;
2	SIMPLE.freq /
3	TABLE={name='customer_segments'}
4	inputs={'age_quantile_group', 'annual_spending_quantile_group'};
5	RUN;
6	QUIT;

Expected Result

The action creates two tables: 'customer_segments' and 'segment_details'. The 'customer_segments' table contains the original data plus two new columns, 'age_quantile_group' and 'annual_spending_quantile_group', with integer values from 1 to 5. The frequency analysis should show that each bin for both variables contains approximately 1000 customers.

Voir la documentation technique de binning