High-Volume Call Log Synchronization Check

Business Context

A telecom operator validates that millions of call records generated by cell towers (Edge) are correctly replicated to the central Data Lake without data loss.

Data Preparation

Generating a larger dataset representing call logs with simulated IDs using a loop.

Copied!

1
2	DATA casuser.tower_logs;
3	DO i=1 to 10000;
4	CallID = catx('-', 'C', i);
5	Duration = rand('integer', 10, 600);
6	OUTPUT;
7	END;
8
9	RUN;
10
11	DATA casuser.datalake_logs;
12	DO i=1 to 9950;
13	CallID = catx('-', 'C', i);
14	Duration = rand('integer', 10, 600);
15	OUTPUT;
16	END;
17
18	RUN;
19

Étapes de réalisation

Compare the two large tables using generated columns to index the differences efficiently.

Copied!

1
2	PROC CAS;
3	SIMPLE.compare / TABLE={name='tower_logs'} table2={name='datalake_logs'} inputs={{name='CallID'}} generatedColumns={'GROUPID', 'POSITION'} groupIDName='Log_ID_Group' casOut={name='lost_packets', replace=true};
4
5	RUN;
6
7	QUIT;
8

Expected Result

The action completes successfully on larger data. The 'lost_packets' table contains exactly the 50 CallIDs present in the tower logs but missing from the data lake (IDs 9951-10000). The generated column 'Log_ID_Group' helps index these missing records.

Voir la documentation technique de compare