sequence

cspade

Description

The cspade action performs sequence mining using the cSpade algorithm. It identifies frequent sequences in a dataset, which are ordered lists of itemsets. This is useful for analyzing transactional data over time, such as customer purchase histories or web navigation paths, to discover common patterns.

sequence.cspade { casout={caslib='string', compress=TRUE|FALSE, indexVars={'variable-name-1' , 'variable-name-2', ...}, label='string', lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat='DVR'|'INHERIT'|'STANDARD', name='table-name', promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy='DEFER'|'NOREDIST'|'REBALANCE', threadBlockSize=64-bit-integer, timeStamp='string', where={'string-1' , 'string-2', ...}}, eventId='variable-name', itemId='variable-name', maxGap=integer, maxLen=integer, maxSize=integer, minGap=integer, sequenceId='variable-name', support=double, supportCnt=64-bit-integer, table={caslib='string', computedOnDemand=TRUE|FALSE, computedVars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, computedVarsProgram='string', dataSourceOptions={'key-1'='any-list-or-data-type-1' , 'key-2'='any-list-or-data-type-2', ...}, importOptions={fileType='ANY'|'AUDIO'|'AUTO'|'BASESAS'|'CSV'|'DELIMITED'|'DOCUMENT'|'DTA'|'ESP'|'EXCEL'|'FMT'|'HDAT'|'IMAGE'|'JMP'|'LASR'|'PARQUET'|'SOUND'|'SPSS'|'VIDEO'|'XLS', fileType-specific-parameters}, name='table-name', singlePass=TRUE|FALSE, vars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, where='where-expression', whereTable={casLib='string', dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType='ANY'|'AUDIO'|'AUTO'|'BASESAS'|'CSV'|'DELIMITED'|'DOCUMENT'|'DTA'|'ESP'|'EXCEL'|'FMT'|'HDAT'|'IMAGE'|'JMP'|'LASR'|'PARQUET'|'SOUND'|'SPSS'|'VIDEO'|'XLS', fileType-specific-parameters}, name='table-name', vars={{ format='string', formattedLength=integer, label='string', name='variable-name', nfd=integer, nfl=integer}, {...}}, where='where-expression'}} };
Settings
ParameterDescription
casoutSpecifies the output table to store the frequent sequences and their support.
eventIdSpecifies the event or time column of the input table, indicating when an item was part of a transaction.
itemIdSpecifies the item column of the input table.
maxGapSpecifies the maximum time difference between consecutive elements in a sequence.
maxLenSpecifies the maximum number of elements in a sequence. Default is 10.
maxSizeSpecifies the maximum number of items within a single element (transaction) of a sequence. Default is 10.
minGapSpecifies the minimum time difference between consecutive elements in a sequence. Default is 1.
sequenceIdSpecifies the column that identifies the sequence or customer.
supportSpecifies the minimum support level for a sequence to be considered frequent, as a proportion (0 to 1).
supportCntSpecifies the minimum number of transactions (count) a sequence must appear in to be considered frequent.
tableSpecifies the input table containing the sequence data for analysis.
Data Preparation View data prep sheet
Data Creation

This example creates a sample dataset of customer transactions over time. Each row represents an item purchased by a customer in a specific event (transaction).

Copied!
1DATA mycas.transactions;
2 INPUT customerId eventId $ item $;
3 CARDS;
41 10 A
51 10 B
61 20 C
71 30 D
82 15 A
92 25 C
102 30 D
113 10 B
123 20 C
133 40 E
14;
15RUN;

Examples

This example performs a basic sequence mining analysis to find frequent sequences with a minimum support count of 2.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 sequence.cspade /
3 TABLE={name='transactions'},
4 sequenceId='customerId',
5 eventId='eventId',
6 itemId='item',
7 supportCnt=2,
8 casout={name='frequent_sequences', replace=true};
9RUN;

This example demonstrates a more advanced use of the cspade action. It specifies a minimum support of 50% of sequences, a maximum sequence length of 3 elements, and a time gap between consecutive elements from 5 to 20 time units.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 sequence.cspade /
3 TABLE={name='transactions'},
4 sequenceId='customerId',
5 eventId='eventId',
6 itemId='item',
7 support=0.5,
8 maxLen=3,
9 minGap=5,
10 maxGap=20,
11 casout={name='detailed_sequences', replace=true, caslib='MyCasLib'};
12RUN;

FAQ

What is the purpose of the cspade action?
Which parameters are required to run the cspade action?
How can I limit the length of the sequences found?
How do I specify the minimum support for a sequence to be considered frequent?
What does the maxGap parameter control?
How can I filter the input data before analysis?
What is the effect of setting the singlePass parameter to True?
How do I limit the number of items within a single element of a sequence?
What output does the cspade action produce?

Associated Scenarios

Use Case
E-commerce Customer Purchase Path Analysis

An online retailer wants to identify the most common navigation paths customers take before making a purchase. By understanding these frequent sequences (e.g., 'Landing Page' ->...

Use Case
High-Volume Sensor Error Burst Detection

A manufacturing plant collects high-frequency logs from industrial machines. The engineering team needs to detect specific patterns of error codes that occur in rapid succession...

Use Case
Credit Card Transaction Analysis with Filtering

A bank is investigating potential credit card fraud patterns. They want to analyze transaction sequences only for 'High Risk' flagged accounts. The goal is to find transaction t...