simple

compare

Description

Compares two tables by computing the index and frequency of each group, cumulative frequency, and the index of each record within its group.

simple.compare / table={...} <table2={...}> <casOut={...}> <casOut2={...}> <freqOut={...}> <inputs={{...}, ...}> <table2Inputs={{...}, ...}> <copyVars={"var1", "var2", ...}> <table2CopyVars={"var1", "var2", ...}> <attributes={{...}, ...}> <table2Attributes={{...}, ...}> <details=boolean> <divider="string"> <frequencyName="name"> <frequency2Name="name"> <cumFreqName="name"> <cumFreq2Name="name"> <groupIDName="name"> <groupID2Name="name"> <groupIDBothName="name"> <positionName="name"> <position2Name="name"> <generatedColumns={"ALL" | "CUMFREQ" | "F" | "FREQUENCY" | "GROUPID" | "NONE" | "POSITION"}> <includeDuplicates={"CASOUT" | "CASOUT2" | "FREQOUT"}> <includeMissing=boolean> <inputRegEx={"regex1", "regex2", ...}> <keyModify={"L" | "R" | "U" | "C", ...}> <maxFrequency=long> <maxPosition=long> <minFrequency=long> <minGroupsLL=long> <minPosition=long> <noVars=boolean> <noVars2=boolean> <outputNamedTables=boolean> <position=long> <sparse=boolean>;
Settings
ParameterDescription
table Specifies the base table for the comparison.
table2 Specifies the comparison table. The records and groups in this table are compared with the records and groups in the base table.
casOut Specifies the output table that contains records unique to the base table.
casOut2 Specifies the output table that contains records unique to the comparison table.
freqOut Specifies the output table that contains frequency information for the groups.
inputs Specifies the input variables from the base table to use for grouping.
table2Inputs Specifies the input variables from the comparison table to use for grouping.
copyVars Specifies additional variables to copy from the base table to the casOut table.
table2CopyVars Specifies additional variables to copy from the comparison table to the casOut2 table.
details When set to TRUE, writes detailed information, such as the number of groups, to the SAS log.
divider Specifies the character used to concatenate character variable values for grouping.
frequencyName Specifies the name for the frequency column in the base table's output.
frequency2Name Specifies the name for the frequency column in the comparison table's output.
cumFreqName Specifies the name for the cumulative frequency column in the base table's output.
cumFreq2Name Specifies the name for the cumulative frequency column in the comparison table's output.
groupIDName Specifies the name for the group ID column for the base table.
groupID2Name Specifies the name for the group ID column for the comparison table.
groupIDBothName Specifies the name for the column that contains group IDs common to both tables.
positionName Specifies the name for the column indicating a record's position within its group in the base table.
position2Name Specifies the name for the column indicating a record's position within its group in the comparison table.
generatedColumns Specifies which generated columns (like _Frequency_, _GroupID_, _Position_) to include in the output.
includeDuplicates Specifies whether to include all records or only unique groups in the output tables.
includeMissing Specifies whether to include groups with missing values in the analysis.
Data Preparation View data prep sheet
Data Creation

This example creates two tables, `sales_q1` and `sales_q2`, to represent sales data for two different quarters. These tables will be compared to identify differences in product sales.

Copied!
1DATA casuser.sales_q1;
2 INFILE DATALINES;
3 INPUT Product $ Sales;
4 DATALINES;
5Apple 100
6Orange 150
7Banana 200
8Apple 120
9;
10RUN;
11 
12DATA casuser.sales_q2;
13 INFILE DATALINES;
14 INPUT Product $ Sales;
15 DATALINES;
16Apple 110
17Orange 160
18Grape 220
19Orange 140
20;
21RUN;

Examples

This example compares the product groups between `sales_q1` and `sales_q2`. The `inputs` parameter specifies that the comparison should be based on the `Product` column. The results will show which products are unique to each quarter and which are common.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 SIMPLE.compare /
3 TABLE={name='sales_q1'},
4 table2={name='sales_q2'},
5 inputs={{name='Product'}},
6 casOut={name='q1_unique', replace=true},
7 casOut2={name='q2_unique', replace=true},
8 freqOut={name='comparison_freq', replace=true};
9RUN;
10QUIT;
Result :
Three output tables are created: `q1_unique` contains products only sold in Q1 (Banana), `q2_unique` contains products only sold in Q2 (Grape), and `comparison_freq` shows the frequency of products that appear in both or have differing counts.

This example performs a detailed comparison, generating several analytical columns. `generatedColumns` is used to create columns for Group ID, Frequency, and Position within the group for both tables. This provides a deeper insight into the structure and composition of sales in each quarter.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 SIMPLE.compare /
3 TABLE={name='sales_q1', caslib='casuser'},
4 table2={name='sales_q2', caslib='casuser'},
5 inputs={{name='Product'}},
6 casOut={name='q1_details', replace=true},
7 casOut2={name='q2_details', replace=true},
8 freqOut={name='comparison_details_freq', replace=true},
9 generatedColumns={'GROUPID', 'FREQUENCY', 'POSITION'},
10 groupIDName='Product_ID_Q1',
11 frequencyName='Count_Q1',
12 positionName='Pos_Q1',
13 groupID2Name='Product_ID_Q2',
14 frequency2Name='Count_Q2',
15 position2Name='Pos_Q2';
16RUN;
17QUIT;
Result :
The output tables `q1_details` and `q2_details` will contain the original data plus new columns: `Product_ID_Q1`, `Count_Q1`, `Pos_Q1` for the first table, and similarly named columns for the second. The `comparison_details_freq` table will provide a summary of group frequencies across both tables, helping to identify changes in sales patterns.

FAQ

What is the purpose of the simple.compare action?
What are the primary input tables for the compare action?
How can I manage the output tables generated by the compare action?
What does the 'generatedColumns' parameter control?

Associated Scenarios

Use Case
Standard Inventory Reconciliation Between Warehouses

A large retail chain needs to reconcile inventory records between their central database and a regional warehouse report to identify discrepancies in stock levels and product li...

Use Case
High-Volume Call Log Synchronization Check

A telecom operator validates that millions of call records generated by cell towers (Edge) are correctly replicated to the central Data Lake without data loss.

Use Case
Edge Case: Customer Merge with Nulls and Duplicates

A CRM system cleanup process needs to compare two lists of potential leads where data quality is poor (missing IDs, duplicate entries) to ensure robust handling of exceptions.