The cspade action performs sequence mining using the cSpade algorithm. It identifies frequent sequences in a dataset, which are ordered lists of itemsets. This is useful for analyzing transactional data over time, such as customer purchase histories or web navigation paths, to discover common patterns.
| Parameter | Description |
|---|---|
| casout | Specifies the output table to store the frequent sequences and their support. |
| eventId | Specifies the event or time column of the input table, indicating when an item was part of a transaction. |
| itemId | Specifies the item column of the input table. |
| maxGap | Specifies the maximum time difference between consecutive elements in a sequence. |
| maxLen | Specifies the maximum number of elements in a sequence. Default is 10. |
| maxSize | Specifies the maximum number of items within a single element (transaction) of a sequence. Default is 10. |
| minGap | Specifies the minimum time difference between consecutive elements in a sequence. Default is 1. |
| sequenceId | Specifies the column that identifies the sequence or customer. |
| support | Specifies the minimum support level for a sequence to be considered frequent, as a proportion (0 to 1). |
| supportCnt | Specifies the minimum number of transactions (count) a sequence must appear in to be considered frequent. |
| table | Specifies the input table containing the sequence data for analysis. |
This example creates a sample dataset of customer transactions over time. Each row represents an item purchased by a customer in a specific event (transaction).
| 1 | DATA mycas.transactions; |
| 2 | INPUT customerId eventId $ item $; |
| 3 | CARDS; |
| 4 | 1 10 A |
| 5 | 1 10 B |
| 6 | 1 20 C |
| 7 | 1 30 D |
| 8 | 2 15 A |
| 9 | 2 25 C |
| 10 | 2 30 D |
| 11 | 3 10 B |
| 12 | 3 20 C |
| 13 | 3 40 E |
| 14 | ; |
| 15 | RUN; |
This example performs a basic sequence mining analysis to find frequent sequences with a minimum support count of 2.
| 1 | PROC CAS; |
| 2 | sequence.cspade / |
| 3 | TABLE={name='transactions'}, |
| 4 | sequenceId='customerId', |
| 5 | eventId='eventId', |
| 6 | itemId='item', |
| 7 | supportCnt=2, |
| 8 | casout={name='frequent_sequences', replace=true}; |
| 9 | RUN; |
This example demonstrates a more advanced use of the cspade action. It specifies a minimum support of 50% of sequences, a maximum sequence length of 3 elements, and a time gap between consecutive elements from 5 to 20 time units.
| 1 | PROC CAS; |
| 2 | sequence.cspade / |
| 3 | TABLE={name='transactions'}, |
| 4 | sequenceId='customerId', |
| 5 | eventId='eventId', |
| 6 | itemId='item', |
| 7 | support=0.5, |
| 8 | maxLen=3, |
| 9 | minGap=5, |
| 10 | maxGap=20, |
| 11 | casout={name='detailed_sequences', replace=true, caslib='MyCasLib'}; |
| 12 | RUN; |
An online retailer wants to identify the most common navigation paths customers take before making a purchase. By understanding these frequent sequences (e.g., 'Landing Page' ->...
A manufacturing plant collects high-frequency logs from industrial machines. The engineering team needs to detect specific patterns of error codes that occur in rapid succession...
A bank is investigating potential credit card fraud patterns. They want to analyze transaction sequences only for 'High Risk' flagged accounts. The goal is to find transaction t...