The compileCategory action builds a categories model from a set of category rules. It takes a CAS table containing the rules as input and produces an output CAS table that holds the compiled category model (MCO file). This action is fundamental in text analytics for creating linguistic rules that can be used to classify documents into predefined categories.
| Parameter | Description |
|---|---|
| casOut | Specifies the output CAS table that contains the categories model (MCO). |
| concept | Specifies an input CAS table that contains the LI binary. The LI binary is optional and can be used to compile the categories model (MCO). |
| config | Specifies the variable name of the input table that contains the configuration. |
| language | Specifies the language that is used in the setting linguistic binaries. The default value is 'en'. |
| ruleId | Specifies the CAS table variable name that contains the rule IDs. |
| table | Specifies the CAS table name that contains the configuration. |
| tokenizer | Specifies which tokenizer to use in the category model. 'STANDARD' (default) uses a language-specific tokenizer. 'BASIC' uses a tokenizer that separates words by white spaces and punctuation, and is available for Chinese, Japanese, and Korean to enhance rule matching. |
This example creates a CAS table named 'category_rules_table'. This table contains the category rules that will be compiled by the compileCategory action. The rules are defined in the 'config' variable.
| 1 | DATA mycas.category_rules_table; |
| 2 | LENGTH config $32767; |
| 3 | INFILE DATALINES dsd; |
| 4 | INPUT config $; |
| 5 | DATALINES; |
| 6 | CATEGORY:myCategory1,C_CONCEPT:myConcept1 |
| 7 | CATEGORY:myCategory2,(OR,C_CONCEPT:conceptA,C_CONCEPT:conceptB) |
| 8 | ; |
| 9 | RUN; |
This example demonstrates a basic compilation of category rules from the 'category_rules_table' into a model named 'category_model_table'.
| 1 | PROC CAS; |
| 2 | textRuleDevelop.compileCategory / |
| 3 | TABLE={name='category_rules_table'}, |
| 4 | config='config', |
| 5 | casOut={name='category_model_table', replace=true}; |
| 6 | RUN; |
This example shows how to compile category rules that depend on a pre-compiled concept model ('concept_model_table'). It also specifies 'Japanese' as the language and uses the 'BASIC' tokenizer for better rule matching with CJK characters.
| 1 | PROC CAS; |
| 2 | textRuleDevelop.compileCategory / |
| 3 | TABLE={name='category_rules_jp'}, |
| 4 | config='config', |
| 5 | concept={name='concept_model_table'}, |
| 6 | language='Japanese', |
| 7 | tokenizer='BASIC', |
| 8 | casOut={name='category_model_jp', replace=true, caslib='MyCaslib'}; |
| 9 | RUN; |