textRuleDevelop

compileConcept

Description

Builds a concept model from linguistic rules defined in a CAS table. This action compiles LITI (Language Interpretation for Text Information) rules into a binary format, which can then be used by other text analytics actions for tasks like concept extraction, categorization, and sentiment analysis. It allows for the inclusion of predefined entities and sentiment models to enrich the custom model.

textRuleDevelop.compileConcept result=<results> status=<rc> / casOut={caslib="string", name="table-name", replace=TRUE|FALSE, ...}, config="string", enablePredefined=TRUE|FALSE, language="string", predefinedSentiment=TRUE|FALSE, ruleId="string", table={caslib="string", name="table-name", ...}, tokenizer="BASIC"|"STANDARD";
Settings
ParameterDescription
casOutSpecifies the output CAS table to store the compiled concept model binary. This table is used as input by other actions like `tpParse` and `tmMine`.
configSpecifies the name of the variable in the input table that contains the concept rule definitions (LITI rules).
enablePredefinedWhen set to TRUE, includes predefined entities (like nlpPerson, nlpLocation) from the specified language's linguistic binaries in the compiled model.
languageSpecifies the language of the linguistic binaries to use for compiling the rules. Default is 'ENGLISH'.
predefinedSentimentWhen set to TRUE, the action extends the predefined sentiment model for the specified language with the custom rules.
ruleIdSpecifies the name of the variable in the input table that contains the unique identifier for each rule.
tableSpecifies the input CAS table that contains the concept rule definitions to be compiled.
tokenizerSpecifies the tokenizer to use. 'STANDARD' uses a language-specific tokenizer. 'BASIC' uses a simple tokenizer based on whitespace and punctuation, which is useful for Chinese, Japanese, and Korean.
Data Preparation View data prep sheet
Creating the Input Concept Rules Table

To use the `compileConcept` action, you first need a CAS table containing your concept rules. This table must have at least two columns: one for the rule ID and one for the rule definition (LITI syntax). The following code creates a simple example of such a table.

Copied!
1DATA mycas.concept_rules;
2 LENGTH ruleid $ 50 config $ 32767;
3 INFILE DATALINES delimiter='|';
4 INPUT ruleid $ config $;
5 DATALINES;
6my_company_concept|CONCEPT_RULE:(C_CONCEPT){SAS}
7my_product_concept|CONCEPT_RULE:(C_CONCEPT){Viya}
8;
9RUN;

Examples

This example demonstrates the simplest use case: compiling a set of rules from an input table into a binary model stored in an output table.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 textRuleDevelop.compileConcept /
3 TABLE={caslib="mycas", name="concept_rules"},
4 ruleId="ruleid",
5 config="config",
6 casOut={caslib="mycas", name="my_concept_model", replace=true};
7RUN;
Result :
The action creates a new CAS table named 'my_concept_model' in the 'mycas' caslib. This table contains the compiled binary model of the rules defined in 'concept_rules'. A success status and log messages confirming the compilation will be displayed.

This example shows how to compile a concept model for Japanese text. It enables predefined entities to leverage SAS-provided concepts and uses the 'BASIC' tokenizer, which is often more effective for languages like Japanese, Chinese, and Korean.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 textRuleDevelop.compileConcept /
3 TABLE={caslib="mycas", name="japanese_rules"},
4 ruleId="rule_id_jp",
5 config="rule_def_jp",
6 language="JAPANESE",
7 enablePredefined=true,
8 tokenizer="BASIC",
9 casOut={caslib="mycas", name="japanese_concept_model", replace=true};
10RUN;
Result :
A CAS table named 'japanese_concept_model' is created. It contains a binary model that combines the custom Japanese rules from the 'japanese_rules' table with SAS's predefined Japanese entities. The use of the 'BASIC' tokenizer ensures that text is segmented correctly based on character properties, improving rule matching accuracy for Japanese.

FAQ

What is the primary function of the compileConcept action?
What is the purpose of the 'casOut' parameter in this action?
How can I include predefined entities in my concept model compilation?
What languages are supported by the compileConcept action?
What is the 'tokenizer' parameter and when should I use the 'BASIC' option?
How can I extend a predefined sentiment model?