textRuleDevelop

compileCategory

Description

The compileCategory action builds a categories model from a set of category rules. It takes a CAS table containing the rules as input and produces an output CAS table that holds the compiled category model (MCO file). This action is fundamental in text analytics for creating linguistic rules that can be used to classify documents into predefined categories.

textRuleDevelop.compileCategory / casOut={<casouttable>} config="string" table={<castable>} [, concept={<castable>}] [, language="string"] [, ruleId="string"] [, tokenizer="BASIC" | "STANDARD"];
Settings
ParameterDescription
casOutSpecifies the output CAS table that contains the categories model (MCO).
conceptSpecifies an input CAS table that contains the LI binary. The LI binary is optional and can be used to compile the categories model (MCO).
configSpecifies the variable name of the input table that contains the configuration.
languageSpecifies the language that is used in the setting linguistic binaries. The default value is 'en'.
ruleIdSpecifies the CAS table variable name that contains the rule IDs.
tableSpecifies the CAS table name that contains the configuration.
tokenizerSpecifies which tokenizer to use in the category model. 'STANDARD' (default) uses a language-specific tokenizer. 'BASIC' uses a tokenizer that separates words by white spaces and punctuation, and is available for Chinese, Japanese, and Korean to enhance rule matching.
Data Preparation View data prep sheet
Creating the Input Rule Table

This example creates a CAS table named 'category_rules_table'. This table contains the category rules that will be compiled by the compileCategory action. The rules are defined in the 'config' variable.

Copied!
1DATA mycas.category_rules_table;
2 LENGTH config $32767;
3 INFILE DATALINES dsd;
4 INPUT config $;
5 DATALINES;
6 CATEGORY:myCategory1,C_CONCEPT:myConcept1
7 CATEGORY:myCategory2,(OR,C_CONCEPT:conceptA,C_CONCEPT:conceptB)
8 ;
9 RUN;

Examples

This example demonstrates a basic compilation of category rules from the 'category_rules_table' into a model named 'category_model_table'.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 textRuleDevelop.compileCategory /
3 TABLE={name='category_rules_table'},
4 config='config',
5 casOut={name='category_model_table', replace=true};
6RUN;
Result :
The action generates a CAS table named 'category_model_table' in the active caslib. This table contains the compiled binary model (MCO) for the specified category rules.

This example shows how to compile category rules that depend on a pre-compiled concept model ('concept_model_table'). It also specifies 'Japanese' as the language and uses the 'BASIC' tokenizer for better rule matching with CJK characters.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 textRuleDevelop.compileCategory /
3 TABLE={name='category_rules_jp'},
4 config='config',
5 concept={name='concept_model_table'},
6 language='Japanese',
7 tokenizer='BASIC',
8 casOut={name='category_model_jp', replace=true, caslib='MyCaslib'};
9RUN;
Result :
An output table named 'category_model_jp' is created in the 'MyCaslib' caslib. This table contains the compiled category model, which leverages the provided concept model and is optimized for Japanese text using the basic tokenizer.

FAQ

What is the primary function of the compileCategory action?
What are the required parameters for the compileCategory action?
What does the 'casOut' parameter represent?
Can I use an existing concept model (LI binary) when compiling a category model?
What options are available for the 'tokenizer' parameter and what is the default?
What is the purpose of the 'language' parameter?