compileCategory - WeAreCAS

Q: What are the required parameters for the compileCategory action?

The required parameters are 'casOut' to specify the output table for the model, 'config' to specify the variable name in the input table that contains the configuration, and 'table' to specify the input CAS table itself.

Q: What does the 'casOut' parameter represent?

The 'casOut' parameter specifies the output CAS table that will contain the compiled categories model (MCO).

Q: Can I use an existing concept model (LI binary) when compiling a category model?

Yes, the optional 'concept' parameter allows you to specify an input CAS table that contains an LI binary, which can be used to compile the categories model.

Q: What options are available for the 'tokenizer' parameter and what is the default?

The 'tokenizer' parameter specifies which tokenizer to use. The default is 'STANDARD', which applies a language-specific tokenizer. The alternative is 'BASIC', which separates words by white spaces and punctuation, and is available for Chinese, Japanese, and Korean to enhance rule matching.

Q: What is the purpose of the 'language' parameter?

The 'language' parameter specifies the language to be used for the linguistic binaries. The default value is 'en' for English.

Description

The compileCategory action builds a categories model from a set of category rules. It takes a CAS table containing the rules as input and produces an output CAS table that holds the compiled category model (MCO file). This action is fundamental in text analytics for creating linguistic rules that can be used to classify documents into predefined categories.

textRuleDevelop.compileCategory / casOut={<casouttable>} config="string" table={<castable>} [, concept={<castable>}] [, language="string"] [, ruleId="string"] [, tokenizer="BASIC" | "STANDARD"];

Settings

Parameter	Description
casOut	Specifies the output CAS table that contains the categories model (MCO).
concept	Specifies an input CAS table that contains the LI binary. The LI binary is optional and can be used to compile the categories model (MCO).
config	Specifies the variable name of the input table that contains the configuration.
language	Specifies the language that is used in the setting linguistic binaries. The default value is 'en'.
ruleId	Specifies the CAS table variable name that contains the rule IDs.
table	Specifies the CAS table name that contains the configuration.
tokenizer	Specifies which tokenizer to use in the category model. 'STANDARD' (default) uses a language-specific tokenizer. 'BASIC' uses a tokenizer that separates words by white spaces and punctuation, and is available for Chinese, Japanese, and Korean to enhance rule matching.

Data Preparation View data prep sheet

Creating the Input Rule Table

This example creates a CAS table named 'category_rules_table'. This table contains the category rules that will be compiled by the compileCategory action. The rules are defined in the 'config' variable.

Copied!

1	DATA mycas.category_rules_table;
2	LENGTH config $32767;
3	INFILE DATALINES dsd;
4	INPUT config $;
5	DATALINES;
6	CATEGORY:myCategory1,C_CONCEPT:myConcept1
7	CATEGORY:myCategory2,(OR,C_CONCEPT:conceptA,C_CONCEPT:conceptB)
8	;
9	RUN;

Examples

This example demonstrates a basic compilation of category rules from the 'category_rules_table' into a model named 'category_model_table'.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	textRuleDevelop.compileCategory /
3	TABLE={name='category_rules_table'},
4	config='config',
5	casOut={name='category_model_table', replace=true};
6	RUN;

Result :
The action generates a CAS table named 'category_model_table' in the active caslib. This table contains the compiled binary model (MCO) for the specified category rules.

This example shows how to compile category rules that depend on a pre-compiled concept model ('concept_model_table'). It also specifies 'Japanese' as the language and uses the 'BASIC' tokenizer for better rule matching with CJK characters.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	textRuleDevelop.compileCategory /
3	TABLE={name='category_rules_jp'},
4	config='config',
5	concept={name='concept_model_table'},
6	language='Japanese',
7	tokenizer='BASIC',
8	casOut={name='category_model_jp', replace=true, caslib='MyCaslib'};
9	RUN;

Result :
An output table named 'category_model_jp' is created in the 'MyCaslib' caslib. This table contains the compiled category model, which leverages the provided concept model and is optimized for Japanese text using the basic tokenizer.

FAQ

What is the primary function of the compileCategory action?

What are the required parameters for the compileCategory action?

What does the 'casOut' parameter represent?

Can I use an existing concept model (LI binary) when compiling a category model?

What options are available for the 'tokenizer' parameter and what is the default?

What is the purpose of the 'language' parameter?

Associated Scenarios

Use Case

Standard Customer Feedback Categorization

A retail company receives thousands of customer reviews daily. They want to automatically classify these reviews into broad topics such as 'Shipping', 'Product Quality', and 'Bi...

View scenario

Use Case

Multi-Language Support Ticket Classification (Japanese)

A global software vendor needs to classify support tickets originating from Japan. The text contains CJK characters which require specific tokenization strategies. This scenario...

View scenario

Use Case

Advanced Pharma Adverse Event Coding with Concepts

A pharmaceutical company analyzes clinical trial notes. They use a two-step process: first, extract specific drug entities (Concepts), then categorize the notes based on the pre...

View scenario

Actions associées

textRuleDevelop

compileConcept

Builds a concept model from linguistic rules defined in a CAS table. This act...

textRuleDevelop

exportTextModel

The exportTextModel action builds an analytic store (astore) model from a cat...

Table of Contents

Description

Creating the Input Rule Table

Examples

Basic Category Compilation

Category Compilation with a Concept Model and Specific Language

FAQ

Associated Scenarios

Use Case

Standard Customer Feedback Categorization

Use Case

Multi-Language Support Ticket Classification (Japanese)

Use Case

Advanced Pharma Adverse Event Coding with Concepts

Actions associées

compileConcept

exportTextModel