compare - WeAreCAS

Q: What is the purpose of the simple.compare action?

The simple.compare action compares two tables by computing the index and frequency of each group, cumulative frequency, and the index of each record within its group.

Q: What are the primary input tables for the compare action?

The compare action uses a base table, specified by the 'table' parameter, and a comparison table, specified by the 'table2' parameter. It compares the records and groups in the comparison table against those in the base table.

Q: How can I manage the output tables generated by the compare action?

You can manage the output using the 'casOut', 'casOut2', and 'freqOut' parameters. 'casOut' is for records unique to the base table, 'casOut2' for records unique to the comparison table, and 'freqOut' for frequency data. Use the 'includeDuplicates' parameter to include all records instead of just unique groups.

Q: What does the 'generatedColumns' parameter control?

The 'generatedColumns' parameter controls which calculated columns are added to the output tables. Options include 'ALL' for every generated column, 'NONE' for no generated columns, or specific columns like 'CUMFREQ' (cumulative frequency), 'FREQUENCY' (group frequency), 'GROUPID' (group identifier), and 'POSITION' (record's position within its group).

Codes SAS Liés

Creation of partitioned ORC data

Testing SAS Viya Job Execution and Management Macros

Unit tests for the mp_replace macro

SAS Viya Job Code Retrieval Test

Test of the mf_verifymacvars macro

Unit test for mm_getauthinfo macro

Macro Scope Test with mp_assertscope

Table Creation Via SELECT with SAS/ACCESS

Initialize variables and clean up for batch comparison

Test of the ms_triggerstp macro

Description

Compares two tables by computing the index and frequency of each group, cumulative frequency, and the index of each record within its group.

simple.compare / table={...} <table2={...}> <casOut={...}> <casOut2={...}> <freqOut={...}> <inputs={{...}, ...}> <table2Inputs={{...}, ...}> <copyVars={"var1", "var2", ...}> <table2CopyVars={"var1", "var2", ...}> <attributes={{...}, ...}> <table2Attributes={{...}, ...}> <details=boolean> <divider="string"> <frequencyName="name"> <frequency2Name="name"> <cumFreqName="name"> <cumFreq2Name="name"> <groupIDName="name"> <groupID2Name="name"> <groupIDBothName="name"> <positionName="name"> <position2Name="name"> <generatedColumns={"ALL" | "CUMFREQ" | "F" | "FREQUENCY" | "GROUPID" | "NONE" | "POSITION"}> <includeDuplicates={"CASOUT" | "CASOUT2" | "FREQOUT"}> <includeMissing=boolean> <inputRegEx={"regex1", "regex2", ...}> <keyModify={"L" | "R" | "U" | "C", ...}> <maxFrequency=long> <maxPosition=long> <minFrequency=long> <minGroupsLL=long> <minPosition=long> <noVars=boolean> <noVars2=boolean> <outputNamedTables=boolean> <position=long> <sparse=boolean>;

Settings

Parameter	Description
table	Specifies the base table for the comparison.
table2	Specifies the comparison table. The records and groups in this table are compared with the records and groups in the base table.
casOut	Specifies the output table that contains records unique to the base table.
casOut2	Specifies the output table that contains records unique to the comparison table.
freqOut	Specifies the output table that contains frequency information for the groups.
inputs	Specifies the input variables from the base table to use for grouping.
table2Inputs	Specifies the input variables from the comparison table to use for grouping.
copyVars	Specifies additional variables to copy from the base table to the casOut table.
table2CopyVars	Specifies additional variables to copy from the comparison table to the casOut2 table.
details	When set to TRUE, writes detailed information, such as the number of groups, to the SAS log.
divider	Specifies the character used to concatenate character variable values for grouping.
frequencyName	Specifies the name for the frequency column in the base table's output.
frequency2Name	Specifies the name for the frequency column in the comparison table's output.
cumFreqName	Specifies the name for the cumulative frequency column in the base table's output.
cumFreq2Name	Specifies the name for the cumulative frequency column in the comparison table's output.
groupIDName	Specifies the name for the group ID column for the base table.
groupID2Name	Specifies the name for the group ID column for the comparison table.
groupIDBothName	Specifies the name for the column that contains group IDs common to both tables.
positionName	Specifies the name for the column indicating a record's position within its group in the base table.
position2Name	Specifies the name for the column indicating a record's position within its group in the comparison table.
generatedColumns	Specifies which generated columns (like _Frequency_, _GroupID_, _Position_) to include in the output.
includeDuplicates	Specifies whether to include all records or only unique groups in the output tables.
includeMissing	Specifies whether to include groups with missing values in the analysis.

Data Preparation View data prep sheet

Data Creation

This example creates two tables, `sales_q1` and `sales_q2`, to represent sales data for two different quarters. These tables will be compared to identify differences in product sales.

Copied!

1	DATA casuser.sales_q1;
2	INFILE DATALINES;
3	INPUT Product $ Sales;
4	DATALINES;
5	Apple 100
6	Orange 150
7	Banana 200
8	Apple 120
9	;
10	RUN;
11
12	DATA casuser.sales_q2;
13	INFILE DATALINES;
14	INPUT Product $ Sales;
15	DATALINES;
16	Apple 110
17	Orange 160
18	Grape 220
19	Orange 140
20	;
21	RUN;

Examples

This example compares the product groups between `sales_q1` and `sales_q2`. The `inputs` parameter specifies that the comparison should be based on the `Product` column. The results will show which products are unique to each quarter and which are common.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	SIMPLE.compare /
3	TABLE={name='sales_q1'},
4	table2={name='sales_q2'},
5	inputs={{name='Product'}},
6	casOut={name='q1_unique', replace=true},
7	casOut2={name='q2_unique', replace=true},
8	freqOut={name='comparison_freq', replace=true};
9	RUN;
10	QUIT;

Result :
Three output tables are created: `q1_unique` contains products only sold in Q1 (Banana), `q2_unique` contains products only sold in Q2 (Grape), and `comparison_freq` shows the frequency of products that appear in both or have differing counts.

This example performs a detailed comparison, generating several analytical columns. `generatedColumns` is used to create columns for Group ID, Frequency, and Position within the group for both tables. This provides a deeper insight into the structure and composition of sales in each quarter.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	SIMPLE.compare /
3	TABLE={name='sales_q1', caslib='casuser'},
4	table2={name='sales_q2', caslib='casuser'},
5	inputs={{name='Product'}},
6	casOut={name='q1_details', replace=true},
7	casOut2={name='q2_details', replace=true},
8	freqOut={name='comparison_details_freq', replace=true},
9	generatedColumns={'GROUPID', 'FREQUENCY', 'POSITION'},
10	groupIDName='Product_ID_Q1',
11	frequencyName='Count_Q1',
12	positionName='Pos_Q1',
13	groupID2Name='Product_ID_Q2',
14	frequency2Name='Count_Q2',
15	position2Name='Pos_Q2';
16	RUN;
17	QUIT;

Result :
The output tables `q1_details` and `q2_details` will contain the original data plus new columns: `Product_ID_Q1`, `Count_Q1`, `Pos_Q1` for the first table, and similarly named columns for the second. The `comparison_details_freq` table will provide a summary of group frequencies across both tables, helping to identify changes in sales patterns.

FAQ

What is the purpose of the simple.compare action?

What are the primary input tables for the compare action?

How can I manage the output tables generated by the compare action?

What does the 'generatedColumns' parameter control?

Associated Scenarios

Use Case

Standard Inventory Reconciliation Between Warehouses

A large retail chain needs to reconcile inventory records between their central database and a regional warehouse report to identify discrepancies in stock levels and product li...

View scenario

Use Case

High-Volume Call Log Synchronization Check

A telecom operator validates that millions of call records generated by cell towers (Edge) are correctly replicated to the central Data Lake without data loss.

View scenario

Use Case

Edge Case: Customer Merge with Nulls and Duplicates

A CRM system cleanup process needs to compare two lists of potential leads where data quality is poor (missing IDs, duplicate entries) to ensure robust handling of exceptions.

View scenario

Actions associées

simple

correlation

The correlation action computes Pearson product-moment correlations. This is ...

simple

crossTab

The crossTab action performs one-way or two-way tabulations, also known as fr...

simple

freq

Generates a frequency distribution for one or more variables. It calculates c...

simple

groupByInfo

The groupByInfo action computes the index and frequency of each group, and th...

simple

groupBy

Builds BY groups in terms of the variable value combinations given the variab...

simple

mdSummary

The mdSummary action calculates multidimensional summaries of numeric variabl...

Table of Contents

Creation of partitioned ORC data

Testing SAS Viya Job Execution and Management Macros

Unit tests for the mp_replace macro

SAS Viya Job Code Retrieval Test

Test of the mf_verifymacvars macro

Unit test for mm_getauthinfo macro

Macro Scope Test with mp_assertscope

Table Creation Via SELECT with SAS/ACCESS

Initialize variables and clean up for batch comparison

Test of the ms_triggerstp macro

Description

Data Creation

Examples

Basic Comparison of Two Tables

Detailed Comparison with Generated Columns

FAQ

Associated Scenarios

Use Case

Standard Inventory Reconciliation Between Warehouses

Use Case

High-Volume Call Log Synchronization Check

Use Case

Edge Case: Customer Merge with Nulls and Duplicates

Actions associées

correlation

crossTab

freq

groupByInfo

groupBy

mdSummary