Standard Case: Merging Customer Data with Campaign Responses

Business Context

A marketing department needs to analyze the effectiveness of a recent email campaign. They have a main table of all customers and a smaller table containing the IDs of customers who clicked a link in the email. The goal is to create a unified dataset to identify which customers responded, for further analysis and segmentation.

About the Set : activeLearn

Active learning to optimize data labeling.

Discover all actions of activeLearn

Data Preparation

Creates a main customer table ('customers_main') and a smaller table of campaign responders ('campaign_responders'). The join will be performed on 'customer_id'.

Copied!

1	DATA casuser.customers_main;
2	LENGTH customer_id $ 10;
3	DO i = 1 TO 50;
4	customer_id = 'CUST' \|\| LEFT(PUT(i, 8.));
5	age = 20 + FLOOR(RAND('UNIFORM') * 40);
6	OUTPUT;
7	END;
8	RUN;
9
10	DATA casuser.campaign_responders;
11	LENGTH customer_id $ 10 response_type $ 8;
12	DO i = 5, 15, 25, 35, 45;
13	customer_id = 'CUST' \|\| LEFT(PUT(i, 8.));
14	response_type = 'CLICK';
15	OUTPUT;
16	END;
17	RUN;

Étapes de réalisation

Load the source tables into CAS if they are not already there (code assumes they are in casuser).

Copied!

1	PROC CAS;
2	TABLE.tableInfo / caslib='casuser';
3	RUN;
4	QUIT;

Execute a LEFT join to merge campaign responders with the main customer list. This ensures all customers are kept in the final table, with response data appended where available.

Copied!

1	PROC CAS;
2	ACTION activeLearn.alJoin /
3	TABLE={name='customers_main'},
4	annotatedTable={name='campaign_responders'},
5	id='customer_id',
6	joinType='LEFT',
7	casOut={name='campaign_results', replace=true};
8	RUN;
9	QUIT;

Verify the output table. Check the row count and the presence of missing values for non-responders.

Copied!

1	PROC CAS;
2	TABLE.fetch /
3	TABLE={name='campaign_results'},
4	to=10;
5	RUN;
6	TABLE.tableInfo / name='campaign_results';
7	RUN;
8	QUIT;

Expected Result

The output table 'campaign_results' should contain exactly 50 rows, one for each customer. The 'response_type' column will be 'CLICK' for the 5 customers who responded and missing (null) for the other 45 customers. This correctly prepares the data for profiling responders vs. non-responders.

Voir la documentation technique de alJoin