activeLearn alJoin

Standard Case: Merging Customer Data with Campaign Responses

Scénario de test & Cas d'usage

Business Context

A marketing department needs to analyze the effectiveness of a recent email campaign. They have a main table of all customers and a smaller table containing the IDs of customers who clicked a link in the email. The goal is to create a unified dataset to identify which customers responded, for further analysis and segmentation.
About the Set : activeLearn

Active learning to optimize data labeling.

Discover all actions of activeLearn
Data Preparation

Creates a main customer table ('customers_main') and a smaller table of campaign responders ('campaign_responders'). The join will be performed on 'customer_id'.

Copied!
1DATA casuser.customers_main;
2 LENGTH customer_id $ 10;
3 DO i = 1 TO 50;
4 customer_id = 'CUST' || LEFT(PUT(i, 8.));
5 age = 20 + FLOOR(RAND('UNIFORM') * 40);
6 OUTPUT;
7 END;
8RUN;
9 
10DATA casuser.campaign_responders;
11 LENGTH customer_id $ 10 response_type $ 8;
12 DO i = 5, 15, 25, 35, 45;
13 customer_id = 'CUST' || LEFT(PUT(i, 8.));
14 response_type = 'CLICK';
15 OUTPUT;
16 END;
17RUN;

Étapes de réalisation

1
Load the source tables into CAS if they are not already there (code assumes they are in casuser).
Copied!
1PROC CAS;
2 TABLE.tableInfo / caslib='casuser';
3RUN;
4QUIT;
2
Execute a LEFT join to merge campaign responders with the main customer list. This ensures all customers are kept in the final table, with response data appended where available.
Copied!
1PROC CAS;
2 ACTION activeLearn.alJoin /
3 TABLE={name='customers_main'},
4 annotatedTable={name='campaign_responders'},
5 id='customer_id',
6 joinType='LEFT',
7 casOut={name='campaign_results', replace=true};
8RUN;
9QUIT;
3
Verify the output table. Check the row count and the presence of missing values for non-responders.
Copied!
1PROC CAS;
2 TABLE.fetch /
3 TABLE={name='campaign_results'},
4 to=10;
5RUN;
6 TABLE.tableInfo / name='campaign_results';
7RUN;
8QUIT;

Expected Result


The output table 'campaign_results' should contain exactly 50 rows, one for each customer. The 'response_type' column will be 'CLICK' for the 5 customers who responded and missing (null) for the other 45 customers. This correctly prepares the data for profiling responders vs. non-responders.