sequence

cspade

Description

The cspade action performs sequence mining using the cSpade algorithm. It identifies frequent sequences in a dataset, which are ordered lists of itemsets. This is useful for analyzing transactional data over time, such as customer purchase histories or web navigation paths, to discover common patterns.

sequence.cspade { casout={caslib='string', compress=TRUE|FALSE, indexVars={'variable-name-1' , 'variable-name-2', ...}, label='string', lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat='DVR'|'INHERIT'|'STANDARD', name='table-name', promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy='DEFER'|'NOREDIST'|'REBALANCE', threadBlockSize=64-bit-integer, timeStamp='string', where={'string-1' , 'string-2', ...}}, eventId='variable-name', itemId='variable-name', maxGap=integer, maxLen=integer, maxSize=integer, minGap=integer, sequenceId='variable-name', support=double, supportCnt=64-bit-integer, table={caslib='string', computedOnDemand=TRUE|FALSE, computedVars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, computedVarsProgram='string', dataSourceOptions={'key-1'='any-list-or-data-type-1' , 'key-2'='any-list-or-data-type-2', ...}, importOptions={fileType='ANY'|'AUDIO'|'AUTO'|'BASESAS'|'CSV'|'DELIMITED'|'DOCUMENT'|'DTA'|'ESP'|'EXCEL'|'FMT'|'HDAT'|'IMAGE'|'JMP'|'LASR'|'PARQUET'|'SOUND'|'SPSS'|'VIDEO'|'XLS', fileType-specific-parameters}, name='table-name', singlePass=TRUE|FALSE, vars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, where='where-expression', whereTable={casLib='string', dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType='ANY'|'AUDIO'|'AUTO'|'BASESAS'|'CSV'|'DELIMITED'|'DOCUMENT'|'DTA'|'ESP'|'EXCEL'|'FMT'|'HDAT'|'IMAGE'|'JMP'|'LASR'|'PARQUET'|'SOUND'|'SPSS'|'VIDEO'|'XLS', fileType-specific-parameters}, name='table-name', vars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, where='where-expression'}} };
Settings
ParameterDescription
casout Specifies the output table to store the frequent sequences and their support.
eventId Specifies the event or time column of the input table, indicating when an item was part of a transaction.
itemId Specifies the item column of the input table.
maxGap Specifies the maximum time difference between consecutive elements in a sequence.
maxLen Specifies the maximum number of elements in a sequence. Default is 10.
maxSize Specifies the maximum number of items within a single element (transaction) of a sequence. Default is 10.
minGap Specifies the minimum time difference between consecutive elements in a sequence. Default is 1.
sequenceId Specifies the column that identifies the sequence or customer.
support Specifies the minimum support level for a sequence to be considered frequent, as a proportion (0 to 1).
supportCnt Specifies the minimum number of transactions (count) a sequence must appear in to be considered frequent.
table Specifies the input table containing the sequence data for analysis.
Data Preparation View data prep sheet
Data Creation

This example creates a sample dataset of customer transactions over time. Each row represents an item purchased by a customer in a specific event (transaction).

Copied!
1DATA mycas.transactions;
2 INPUT customerId eventId $ item $;
3 CARDS;
41 10 A
51 10 B
61 20 C
71 30 D
82 15 A
92 25 C
102 30 D
113 10 B
123 20 C
133 40 E
14;
15RUN;

Examples

This example performs a basic sequence mining analysis to find frequent sequences with a minimum support count of 2.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 sequence.cspade /
3 TABLE={name='transactions'},
4 sequenceId='customerId',
5 eventId='eventId',
6 itemId='item',
7 supportCnt=2,
8 casout={name='frequent_sequences', replace=true};
9RUN;

This example demonstrates a more advanced use of the cspade action. It specifies a minimum support of 50% of sequences, a maximum sequence length of 3 elements, and a time gap between consecutive elements from 5 to 20 time units.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 sequence.cspade /
3 TABLE={name='transactions'},
4 sequenceId='customerId',
5 eventId='eventId',
6 itemId='item',
7 support=0.5,
8 maxLen=3,
9 minGap=5,
10 maxGap=20,
11 casout={name='detailed_sequences', replace=true, caslib='MyCasLib'};
12RUN;

FAQ

What is the purpose of the cspade action?
Which parameters are required to run the cspade action?
How can I limit the length of the sequences found?
How do I specify the minimum support for a sequence to be considered frequent?
What does the maxGap parameter control?
How can I filter the input data before analysis?
What is the effect of setting the singlePass parameter to True?
How do I limit the number of items within a single element of a sequence?
What output does the cspade action produce?

Associated Scenarios

Use Case
E-commerce Customer Purchase Path Analysis

An online retailer wants to identify the most common navigation paths customers take before making a purchase. By understanding these frequent sequences (e.g., 'Landing Page' ->...

Use Case
High-Volume Sensor Error Burst Detection

A manufacturing plant collects high-frequency logs from industrial machines. The engineering team needs to detect specific patterns of error codes that occur in rapid succession...

Use Case
Credit Card Transaction Analysis with Filtering

A bank is investigating potential credit card fraud patterns. They want to analyze transaction sequences only for 'High Risk' flagged accounts. The goal is to find transaction t...