compileConcept - WeAreCAS

Q: What is the primary function of the compileConcept action?

The compileConcept action builds a concept model using concept configuration data and predefined entities. The resulting model is stored as a binary object in a CAS table.

Q: What is the purpose of the 'casOut' parameter in this action?

The 'casOut' parameter is required and specifies the output CAS table that will contain the compiled concept model binary. This binary can then be used by other actions like 'tpParse' and 'tmMine'.

Q: How can I include predefined entities in my concept model compilation?

You can include predefined entities by setting the 'enablePredefined' parameter to TRUE. By default, this parameter is set to FALSE.

Q: What languages are supported by the compileConcept action?

The language of the linguistic binaries is specified using the 'language' parameter. The default language is English. Other languages are supported if licensed.

Q: What is the 'tokenizer' parameter and when should I use the 'BASIC' option?

The 'tokenizer' parameter specifies the tokenizer to use. The default is 'STANDARD', which applies a language-specific tokenizer. The 'BASIC' option uses a tokenizer that separates words by white spaces, punctuation, and CJKT characters. The 'BASIC' tokenizer is only available for Chinese, Japanese, and Korean and can enhance rule matching for specific texts in these languages.

Q: How can I extend a predefined sentiment model?

To extend a predefined sentiment model for a specific language, you need to set the 'predefinedSentiment' parameter to TRUE. The action will then use the sentiment model corresponding to the language specified in the 'language' parameter.

Description

Builds a concept model from linguistic rules defined in a CAS table. This action compiles LITI (Language Interpretation for Text Information) rules into a binary format, which can then be used by other text analytics actions for tasks like concept extraction, categorization, and sentiment analysis. It allows for the inclusion of predefined entities and sentiment models to enrich the custom model.

textRuleDevelop.compileConcept result=<results> status=<rc> / casOut={caslib="string", name="table-name", replace=TRUE|FALSE, ...}, config="string", enablePredefined=TRUE|FALSE, language="string", predefinedSentiment=TRUE|FALSE, ruleId="string", table={caslib="string", name="table-name", ...}, tokenizer="BASIC"|"STANDARD";

Settings

Parameter	Description
casOut	Specifies the output CAS table to store the compiled concept model binary. This table is used as input by other actions like `tpParse` and `tmMine`.
config	Specifies the name of the variable in the input table that contains the concept rule definitions (LITI rules).
enablePredefined	When set to TRUE, includes predefined entities (like nlpPerson, nlpLocation) from the specified language's linguistic binaries in the compiled model.
language	Specifies the language of the linguistic binaries to use for compiling the rules. Default is 'ENGLISH'.
predefinedSentiment	When set to TRUE, the action extends the predefined sentiment model for the specified language with the custom rules.
ruleId	Specifies the name of the variable in the input table that contains the unique identifier for each rule.
table	Specifies the input CAS table that contains the concept rule definitions to be compiled.
tokenizer	Specifies the tokenizer to use. 'STANDARD' uses a language-specific tokenizer. 'BASIC' uses a simple tokenizer based on whitespace and punctuation, which is useful for Chinese, Japanese, and Korean.

Data Preparation View data prep sheet

Creating the Input Concept Rules Table

To use the `compileConcept` action, you first need a CAS table containing your concept rules. This table must have at least two columns: one for the rule ID and one for the rule definition (LITI syntax). The following code creates a simple example of such a table.

Copied!

1	DATA mycas.concept_rules;
2	LENGTH ruleid $ 50 config $ 32767;
3	INFILE DATALINES delimiter='\|';
4	INPUT ruleid $ config $;
5	DATALINES;
6	my_company_concept\|CONCEPT_RULE:(C_CONCEPT){SAS}
7	my_product_concept\|CONCEPT_RULE:(C_CONCEPT){Viya}
8	;
9	RUN;

Examples

This example demonstrates the simplest use case: compiling a set of rules from an input table into a binary model stored in an output table.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	textRuleDevelop.compileConcept /
3	TABLE={caslib="mycas", name="concept_rules"},
4	ruleId="ruleid",
5	config="config",
6	casOut={caslib="mycas", name="my_concept_model", replace=true};
7	RUN;

Result :
The action creates a new CAS table named 'my_concept_model' in the 'mycas' caslib. This table contains the compiled binary model of the rules defined in 'concept_rules'. A success status and log messages confirming the compilation will be displayed.

This example shows how to compile a concept model for Japanese text. It enables predefined entities to leverage SAS-provided concepts and uses the 'BASIC' tokenizer, which is often more effective for languages like Japanese, Chinese, and Korean.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	textRuleDevelop.compileConcept /
3	TABLE={caslib="mycas", name="japanese_rules"},
4	ruleId="rule_id_jp",
5	config="rule_def_jp",
6	language="JAPANESE",
7	enablePredefined=true,
8	tokenizer="BASIC",
9	casOut={caslib="mycas", name="japanese_concept_model", replace=true};
10	RUN;

Result :
A CAS table named 'japanese_concept_model' is created. It contains a binary model that combines the custom Japanese rules from the 'japanese_rules' table with SAS's predefined Japanese entities. The use of the 'BASIC' tokenizer ensures that text is segmented correctly based on character properties, improving rule matching accuracy for Japanese.

FAQ

What is the primary function of the compileConcept action?

What is the purpose of the 'casOut' parameter in this action?

How can I include predefined entities in my concept model compilation?

What languages are supported by the compileConcept action?

What is the 'tokenizer' parameter and when should I use the 'BASIC' option?

How can I extend a predefined sentiment model?

Associated Scenarios

Use Case

Adverse Event Detection Model Compilation

A pharmaceutical company analyzes patient feedback forms to automatically detect adverse events. They require a custom LITI model that combines specific internal drug names (cus...

View scenario

Use Case

High-Volume Product Sentiment Model Extension

A global e-commerce giant monitors reviews for thousands of products. They need to extend the standard English sentiment model with thousands of product-specific slang terms and...

View scenario

Use Case

Japanese Log Analysis with Basic Tokenizer

An IT support center in Tokyo analyzes server logs written in Japanese. The standard tokenizer struggles with their specific technical jargon and log formats. They require the '...

View scenario

Actions associées

textRuleDevelop

compileCategory

The compileCategory action builds a categories model from a set of category r...

textRuleDevelop

exportTextModel

The exportTextModel action builds an analytic store (astore) model from a cat...

Table of Contents

Description

Creating the Input Concept Rules Table

Examples

Basic Concept Model Compilation

Compiling a Model with Predefined Entities and a Specific Tokenizer

FAQ

Associated Scenarios

Use Case

Adverse Event Detection Model Compilation

Use Case

High-Volume Product Sentiment Model Extension

Use Case

Japanese Log Analysis with Basic Tokenizer

Actions associées

compileCategory

exportTextModel