cspade - WeAreCAS

Q: What is the purpose of the cspade action?

The cspade action performs sequence rule mining using the cSpade algorithm to identify frequent sequences in a transaction database.

Q: Which parameters are required to run the cspade action?

The required parameters are "table" (input table), "sequenceId" (specifying the sequence or customer column), "eventId" (specifying the event or time column), and "itemId" (specifying the item column).

Q: How can I limit the length of the sequences found?

You can use the "maxLen" parameter to specify the maximum number of elements in a sequence. The default value is 10.

Q: How do I specify the minimum support for a sequence to be considered frequent?

Use the "support" parameter (alias "supmin") to specify the minimum level of support as a value between 0 and 1. Alternatively, you can use "supportCnt" to specify a minimum count of transactions.

Q: What does the maxGap parameter control?

The "maxGap" parameter specifies the maximum time difference allowed between consecutive elements of a sequence.

Q: How can I filter the input data before analysis?

You can use the "where" parameter to specify an expression for subsetting the input data. Additionally, the "whereTable" parameter allows you to use rows from another table as a filter.

Q: What is the effect of setting the singlePass parameter to True?

Setting "singlePass" to True prevents the creation of a transient table on the server. This can be more efficient but might result in unstable data ordering upon repeated runs.

Q: How do I limit the number of items within a single element of a sequence?

You can use the "maxSize" parameter to specify the maximum number of items allowed in an element of a sequence. The default value is 10.

Q: What output does the cspade action produce?

The action produces an output table (specified by the "casout" parameter) that contains the identified frequent sequences and their support values.

Description

The cspade action performs sequence mining using the cSpade algorithm. It identifies frequent sequences in a dataset, which are ordered lists of itemsets. This is useful for analyzing transactional data over time, such as customer purchase histories or web navigation paths, to discover common patterns.

sequence.cspade { casout={caslib='string', compress=TRUE|FALSE, indexVars={'variable-name-1' , 'variable-name-2', ...}, label='string', lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat='DVR'|'INHERIT'|'STANDARD', name='table-name', promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy='DEFER'|'NOREDIST'|'REBALANCE', threadBlockSize=64-bit-integer, timeStamp='string', where={'string-1' , 'string-2', ...}}, eventId='variable-name', itemId='variable-name', maxGap=integer, maxLen=integer, maxSize=integer, minGap=integer, sequenceId='variable-name', support=double, supportCnt=64-bit-integer, table={caslib='string', computedOnDemand=TRUE|FALSE, computedVars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, computedVarsProgram='string', dataSourceOptions={'key-1'='any-list-or-data-type-1' , 'key-2'='any-list-or-data-type-2', ...}, importOptions={fileType='ANY'|'AUDIO'|'AUTO'|'BASESAS'|'CSV'|'DELIMITED'|'DOCUMENT'|'DTA'|'ESP'|'EXCEL'|'FMT'|'HDAT'|'IMAGE'|'JMP'|'LASR'|'PARQUET'|'SOUND'|'SPSS'|'VIDEO'|'XLS', fileType-specific-parameters}, name='table-name', singlePass=TRUE|FALSE, vars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, where='where-expression', whereTable={casLib='string', dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType='ANY'|'AUDIO'|'AUTO'|'BASESAS'|'CSV'|'DELIMITED'|'DOCUMENT'|'DTA'|'ESP'|'EXCEL'|'FMT'|'HDAT'|'IMAGE'|'JMP'|'LASR'|'PARQUET'|'SOUND'|'SPSS'|'VIDEO'|'XLS', fileType-specific-parameters}, name='table-name', vars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, where='where-expression'}} };

Settings

Parameter	Description
casout	Specifies the output table to store the frequent sequences and their support.
eventId	Specifies the event or time column of the input table, indicating when an item was part of a transaction.
itemId	Specifies the item column of the input table.
maxGap	Specifies the maximum time difference between consecutive elements in a sequence.
maxLen	Specifies the maximum number of elements in a sequence. Default is 10.
maxSize	Specifies the maximum number of items within a single element (transaction) of a sequence. Default is 10.
minGap	Specifies the minimum time difference between consecutive elements in a sequence. Default is 1.
sequenceId	Specifies the column that identifies the sequence or customer.
support	Specifies the minimum support level for a sequence to be considered frequent, as a proportion (0 to 1).
supportCnt	Specifies the minimum number of transactions (count) a sequence must appear in to be considered frequent.
table	Specifies the input table containing the sequence data for analysis.

Data Preparation View data prep sheet

Data Creation

This example creates a sample dataset of customer transactions over time. Each row represents an item purchased by a customer in a specific event (transaction).

Copied!

1	DATA mycas.transactions;
2	INPUT customerId eventId $ item $;
3	CARDS;
4	1 10 A
5	1 10 B
6	1 20 C
7	1 30 D
8	2 15 A
9	2 25 C
10	2 30 D
11	3 10 B
12	3 20 C
13	3 40 E
14	;
15	RUN;

Examples

This example performs a basic sequence mining analysis to find frequent sequences with a minimum support count of 2.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	sequence.cspade /
3	TABLE={name='transactions'},
4	sequenceId='customerId',
5	eventId='eventId',
6	itemId='item',
7	supportCnt=2,
8	casout={name='frequent_sequences', replace=true};
9	RUN;

This example demonstrates a more advanced use of the cspade action. It specifies a minimum support of 50% of sequences, a maximum sequence length of 3 elements, and a time gap between consecutive elements from 5 to 20 time units.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	sequence.cspade /
3	TABLE={name='transactions'},
4	sequenceId='customerId',
5	eventId='eventId',
6	itemId='item',
7	support=0.5,
8	maxLen=3,
9	minGap=5,
10	maxGap=20,
11	casout={name='detailed_sequences', replace=true, caslib='MyCasLib'};
12	RUN;

FAQ

What is the purpose of the cspade action?

Which parameters are required to run the cspade action?

How can I limit the length of the sequences found?

How do I specify the minimum support for a sequence to be considered frequent?

What does the maxGap parameter control?

How can I filter the input data before analysis?

What is the effect of setting the singlePass parameter to True?

How do I limit the number of items within a single element of a sequence?

What output does the cspade action produce?

Associated Scenarios

Use Case

E-commerce Customer Purchase Path Analysis

An online retailer wants to identify the most common navigation paths customers take before making a purchase. By understanding these frequent sequences (e.g., 'Landing Page' ->...

View scenario

Use Case

High-Volume Sensor Error Burst Detection

A manufacturing plant collects high-frequency logs from industrial machines. The engineering team needs to detect specific patterns of error codes that occur in rapid succession...

View scenario

Use Case

Credit Card Transaction Analysis with Filtering

A bank is investigating potential credit card fraud patterns. They want to analyze transaction sequences only for 'High Risk' flagged accounts. The goal is to find transaction t...

View scenario

Table of Contents

Description

Data Creation

Examples

Basic Sequence Mining

Detailed Sequence Mining with Time Gaps and Length Constraints

FAQ

Associated Scenarios

Use Case

E-commerce Customer Purchase Path Analysis

Use Case

High-Volume Sensor Error Burst Detection

Use Case

Credit Card Transaction Analysis with Filtering