applyConcept - WeAreCAS

Q: What is the purpose of the `applyConcept` action in SAS Viya?

The `applyConcept` action performs concept extraction on a given text using a concept extraction model, which is defined in a LITI (Language Interpretation for Text Information) file.

Q: What are the primary input and output parameters for the `applyConcept` action?

The primary input is a CAS table specified by the `table` parameter, containing the documents to process. The `docId` and `text` parameters specify the columns for the document ID and the text content, respectively. The main output is the `casOut` table, which contains the concept match results. Other outputs include `factOut` for fact matches and `ruleMatchOut` for rule matches.

Q: How can I use a custom concept model with the `applyConcept` action?

You can specify a user-defined LITI model using the `model` parameter. This parameter points to a CAS table that contains your custom model. If this parameter is not specified, the base model is used.

Q: What does the `matchType` parameter control?

The `matchType` parameter specifies the matching strategy for concepts. It can be set to 'ALL' (default) to return all matches, 'BEST' to return only the best-scoring matches, or 'LONGEST' to return the longest matches.

Q: How can I optimize the performance of the `applyConcept` action, especially with large documents or complex models?

To improve performance, especially when using the CLAUS_n operator, you can use a two-step process. First, run `applyConcept` with the `parseTableOut` parameter to save the pre-parsed documents to a CAS table. Then, in a subsequent run, use this table as input via the `parseTableIn` parameter to avoid re-parsing the text.

Q: Is it possible to exclude certain concepts from the output results?

Yes, you can use the `dropConcepts` parameter to provide a list of concept names that you want to exclude from the output tables. This is useful for filtering out predefined or intermediate concepts from the final results without removing them from the model itself.

Q: What does the `litiChunkSize` parameter do?

The `litiChunkSize` parameter specifies the size of the data chunks used for processing a document. The default is '32K'. For very large documents, using a smaller chunk size like '32K' or '64K' can improve performance and reduce memory consumption. Setting it to 'ALL' processes the entire document at once, which can be memory-intensive.

At a glance

Data Scientists and SAS developers utilize the applyConcept action to orchestrate sophisticated information extraction tasks across distributed CAS tables. Unlike simple keyword matching, this powerful tool deploys compiled linguistic models—stored as LI files—to parse unstructured text and isolate meaningful patterns or entities. By integrating these rule-based insights, users can significantly enhance the depth of their text analytics pipelines. This documentation hub provides a comprehensive FAQ section designed to address common implementation challenges and syntax requirements for optimal extraction results.

Description

The applyConcept action performs concept extraction using a predefined or custom concept extraction model (a LITI file). It is part of the Text Analytics Rule Score action set, which provides tools for linguistic rule scoring for categorization, concept extraction, and sentiment analysis. This action processes an input text document or a table of documents and identifies occurrences of concepts defined in the model, outputting detailed match information.

textRuleScore.applyConcept { casOut={...}, docId="string", dropConcepts={"string-1", ...}, factOut={...}, language="string", litiChunkSize="string", matchType="ALL"|"BEST"|"LONGEST", model={...}, parseTableIn={...}, parseTableOut={...}, ruleMatchOut={...}, table={...}, text="string" };

Settings

Parameter	Description
casOut	Specifies the output CAS table to store the concept match results.
docId	Specifies the name of the variable in the input table that contains the document IDs.
dropConcepts	Specifies a list of concept names to exclude from the output tables. This is useful for filtering out predefined concepts without modifying the model.
factOut	Specifies the output CAS table for storing fact match results.
language	Specifies the language of the input text. Default is 'ENGLISH'.
litiChunkSize	Specifies the chunk size for document processing (e.g., '32K', '1M', 'ALL'). Smaller sizes can help manage memory for large documents. Default is '32K'.
matchType	Specifies the matching strategy: 'ALL' for all matches, 'BEST' for the best match, or 'LONGEST' for the longest match. Default is 'ALL'.
model	Specifies the input CAS table containing the user-defined LITI (Language Interpretation for Text Information) model for concept extraction.
parseTableIn	Specifies a CAS table containing pre-parsed documents from a previous run, which can improve performance, especially when using the CLAUS_n operator.
parseTableOut	Specifies a CAS table to save pre-parsed documents, which can be used as input for future runs to improve performance.
ruleMatchOut	Specifies the output CAS table to store detailed rule match information, which can be used as input for the ruleGen action.
table	Specifies the input CAS table that contains the documents to be processed.
text	Specifies the name of the variable in the input table that contains the document text.

Data Preparation View data prep sheet

Data Creation

This example creates a sample CAS table named 'my_documents' with two columns: 'doc_id' for the document identifier and 'text' for the document content. This table will be used as input for the concept extraction.

Copied!

1	DATA mycas.my_documents;
2	INFILE DATALINES delimiter='\|';
3	LENGTH doc_id $ 10 text $ 300;
4	INPUT doc_id $ text $;
5	DATALINES;
6	doc1\|The new SAS Viya platform is a powerful analytics tool.
7	doc2\|SAS Cloud Analytic Services (CAS) is the engine behind Viya.
8	doc3\|You can use LITI models for concept extraction.
9	;
10	RUN;

Examples

This example applies the default concept extraction model to the 'my_documents' table. It identifies concepts in the 'text' column, using 'doc_id' as the document identifier. The results are stored in a CAS table named 'concept_matches'.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	textRuleScore.applyConcept /
3	TABLE={name='my_documents'},
4	docId='doc_id',
5	text='text',
6	casOut={name='concept_matches', replace=true};
7	RUN;

Result :
The action will produce an output table 'concept_matches' in the current caslib. This table will contain the concepts found in each document, such as 'SAS' or 'platform', along with their start and end positions.

This example demonstrates a more advanced use case. It first loads a custom LITI model from a table named 'my_liti_model'. Then, it applies this model to the 'my_documents' table. It specifies 'LONGEST' for the match type to only return the longest matching string for overlapping concepts. It generates three output tables: 'concept_matches' for the main results, 'fact_matches' for extracted facts, and 'rulematch_details' for detailed rule matching information used for debugging or further analysis.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	textRuleScore.applyConcept /
3	TABLE={name='my_documents'},
4	docId='doc_id',
5	text='text',
6	model={name='my_liti_model'},
7	matchType='LONGEST',
8	casOut={name='concept_matches', replace=true},
9	factOut={name='fact_matches', replace=true},
10	ruleMatchOut={name='rulematch_details', replace=true};
11	RUN;

Result :
Three tables will be created in the current caslib: 'concept_matches' with the longest concept matches, 'fact_matches' containing any facts extracted based on the LITI rules, and 'rulematch_details' with granular data about which rules were triggered for each match.

This example shows a two-step process to improve performance. First, `applyConcept` is called with the `parseTableOut` parameter to create a table of pre-parsed documents named 'parsed_docs'. In the second call, this 'parsed_docs' table is used as input via the `parseTableIn` parameter, which can speed up processing, especially with complex models or large documents.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	/* Step 1: Parse documents and save the intermediate table */
3	textRuleScore.applyConcept /
4	TABLE={name='my_documents'},
5	docId='doc_id',
6	text='text',
7	parseTableOut={name='parsed_docs', replace=true};
8
9	/* Step 2: Use the pre-parsed table for faster concept extraction */
10	textRuleScore.applyConcept /
11	TABLE={name='my_documents'},
12	docId='doc_id',
13	text='text',
14	parseTableIn={name='parsed_docs'},
15	casOut={name='concept_matches_fast', replace=true};
16	RUN;

Result :
The first step creates the 'parsed_docs' table. The second step uses this intermediate table to create 'concept_matches_fast', which will contain the same concept matches as a single run but may complete more quickly.

FAQ

What is the purpose of the `applyConcept` action in SAS Viya?

What are the primary input and output parameters for the `applyConcept` action?

How can I use a custom concept model with the `applyConcept` action?

What does the `matchType` parameter control?

How can I optimize the performance of the `applyConcept` action, especially with large documents or complex models?

Is it possible to exclude certain concepts from the output results?

What does the `litiChunkSize` parameter do?

Associated Scenarios

Use Case

Standard Case: Customer Feedback Analysis for Product and Sentiment Extraction

A marketing department wants to analyze customer support emails to automatically identify which products are mentioned and the associated sentiment (positive or negative). This ...

View scenario

Use Case

Performance Case: High-Volume Server Log Analysis with Pre-Parsing

An IT operations team needs to process a massive stream of server logs to detect critical error patterns. To meet real-time monitoring needs, the process must be highly performa...

View scenario

Use Case

Edge Case: Financial Compliance Screening with Overlapping Concepts and Filtering

A financial compliance team must screen internal communications for mentions of specific, restricted projects ('Project Chimera') to prevent information leaks. They need to dist...

View scenario

Actions associées

textRuleScore

applyCategory

The applyCategory action categorizes text documents based on a pre-built cate...

textRuleScore

loadTableFromDisk

Loads a binary model file, such as a sentiment analysis model (SAM), a catego...

Table of Contents

At a glance

Description

Data Creation

Examples

Basic Concept Extraction

Concept Extraction with a Custom LITI Model and Multiple Outputs

Using Pre-Parsed Data for Efficiency

FAQ

Associated Scenarios

Use Case

Standard Case: Customer Feedback Analysis for Product and Sentiment Extraction

Use Case

Performance Case: High-Volume Server Log Analysis with Pre-Parsing

Use Case

Edge Case: Financial Compliance Screening with Overlapping Concepts and Filtering

Actions associées

applyCategory

loadTableFromDisk