ruleMining

mbanalysis

Description

The mbanalysis action performs market basket analysis, a common technique in data mining used to discover co-occurrence relationships among items. It identifies association rules from a transactional dataset, which is useful for understanding customer purchasing patterns, such as which products are frequently bought together. This information can be leveraged for store layout design, cross-marketing promotions, and catalog design.

ruleMining.mbanalysis <result=results> <status=rc> / antecedentList={"string-1" <, "string-2", ...>}, conf=double, consequentList={"string-1" <, "string-2", ...>}, hierarchy={{castable-1} <, {castable-2}, ...>}, idVariable="variable-name", items=integer, lift=double, maxItems=integer, minItems=integer, nLHS_range={nLHSRHSOpts}, norm=TRUE | FALSE, nRHS_range={nLHSRHSOpts}, out={casouttable}, outfreq={casouttable}, outrule={casouttable}, saveState={casouttable}, separator="string", sup_lift=double, supmin=double, suppct=double, table={castable}, tgtVariable="variable-name";
Settings
ParameterDescription
antecedentList Specifies the regular expression strings to match in the antecedent (left-hand side) of a rule.
conf Specifies the minimum confidence for the rules, as a percentage. Default: 50.
consequentList Specifies the regular expression strings to match in the consequent (right-hand side) of a rule.
hierarchy Specifies one or more hierarchy tables. If you omit this parameter, the action performs simple association analysis without a hierarchy. You can specify up to five tables, each specifying one level of the hierarchy.
idVariable Specifies the variable used to group the target variable into baskets (transactions).
items Specifies the number of items in a rule. Default is 2 when 'out' or 'outrule' is specified, otherwise 1.
lift Specifies the minimum lift value necessary to generate a rule. Default: 1.
maxItems Specifies a maximum basket size; baskets larger than this value are rejected. Default: 1000.
minItems Specifies a minimum basket size; baskets smaller than this value are rejected. Default: 1.
nLHS_range Specifies the range for the number of items in the left-hand side (LHS) of a rule, including 'lower' and 'upper' bounds.
norm When set to True, normalizes the values of the target variable and the items in the output tables. Default: FALSE.
nRHS_range Specifies the range for the number of items in the right-hand side (RHS) of a rule, including 'lower' and 'upper' bounds.
out Specifies the output table to contain frequent item sets used to generate rules, including transaction counts and support.
outfreq Specifies the output table to contain the unique frequent items along with their transaction counts and support.
outrule Specifies the output table to contain the generated rules, including LHS, RHS, support, confidence, and lift.
saveState Specifies the table in which to save the mining model for future scoring.
separator Specifies the separator character used in the rule's antecedent and consequent strings. Default: '&'.
sup_lift Specifies the minimum support lift necessary to generate a rule. Default: 0.
supmin Specifies the minimum absolute support count for a rule. This overrides the 'suppct' parameter.
suppct Specifies the minimum support for a rule as a percentage of the total number of baskets.
table Specifies the input data table containing transaction data.
tgtVariable Specifies the nominal variable to be used as the target item in the transactions.
Data Preparation View data prep sheet
Create Sample Transaction Data

This example code creates a simple CAS table named 'sample_transactions' containing transaction data. Each row represents an item within a transaction, identified by 'transaction_id'. This format is typical for market basket analysis.

Copied!
1DATA casuser.sample_transactions;
2 INFILE DATALINES;
3 INPUT transaction_id item $;
4 DATALINES;
51 apple
61 banana
71 milk
82 bread
92 butter
103 apple
113 bread
123 cheese
13;
14RUN;

Examples

This example performs a basic market basket analysis on the 'sample_transactions' table. It identifies rules with a minimum support of 1% and a minimum confidence of 50%. The resulting rules are saved to the 'basic_rules' CAS table.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 ruleMining.mbanalysis /
3 TABLE={name='sample_transactions'},
4 idVariable='transaction_id',
5 tgtVariable='item',
6 suppct=1,
7 conf=50,
8 outrule={name='basic_rules', replace=true};
9RUN;
Result :
An output table named 'basic_rules' is created in the 'casuser' caslib. This table contains association rules that meet the specified support and confidence thresholds, showing relationships like which items are often purchased together.

This example demonstrates a more targeted analysis. It searches for rules containing exactly 3 items, filters for rules where 'apple' is in the antecedent (LHS), and requires a minimum lift of 1.2. This helps identify strong, specific associations involving a particular product.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 ruleMining.mbanalysis /
3 TABLE={name='sample_transactions'},
4 idVariable='transaction_id',
5 tgtVariable='item',
6 items=3,
7 lift=1.2,
8 antecedentList={'apple'},
9 outrule={name='detailed_apple_rules', replace=true};
10RUN;
Result :
The action produces a CAS table named 'detailed_apple_rules'. This table will contain only the 3-item rules where 'apple' is part of the antecedent and the rule's lift is at least 1.2, indicating a stronger-than-random co-occurrence with the consequent items.

FAQ

What does the mbanalysis action do?
What are the required parameters for the mbanalysis action?
What is the purpose of the 'conf' parameter?
How can I filter rules based on lift?
Is it possible to perform analysis with a product hierarchy?
How can I save the results of the market basket analysis?