Edge Case: Customer Merge with Nulls and Duplicates

Business Context

A CRM system cleanup process needs to compare two lists of potential leads where data quality is poor (missing IDs, duplicate entries) to ensure robust handling of exceptions.

Data Preparation

Creation of datasets with missing keys (nulls) and duplicate entries for the same key.

Copied!

1	DATA casuser.crm_list_A; LENGTH Email $20; INPUT Email $ STATUS $; DATALINES; . Active
2	john@doe.com Active
3	john@doe.com Pending
4	; RUN; DATA casuser.crm_list_B; LENGTH Email $20; INPUT Email $ STATUS $; DATALINES; jane@doe.com New
5	. Archived
6	; RUN;

Étapes de réalisation

Run comparison including missing values and duplicates to test robustness.

Copied!

1
2	PROC CAS;
3	SIMPLE.compare / TABLE={name='crm_list_A'} table2={name='crm_list_B'} inputs={{name='Email'}} includeMissing=true includeDuplicates='CASOUT' casOut={name='unique_source_A', replace=true};
4
5	RUN;
6
7	QUIT;
8

Check if duplicates were preserved in the output as requested.

Copied!

1
2	PROC CAS;
3	TABLE.recordCount / TABLE={name='unique_source_A'};
4
5	RUN;
6
7	QUIT;
8

Expected Result

The comparison does not fail on null keys due to 'includeMissing=true'. The 'unique_source_A' table includes both entries for 'john@doe.com' because 'includeDuplicates' was set to 'CASOUT', verifying the system handles non-unique keys correctly.

Voir la documentation technique de compare