The `buildTermIndex` action creates a term index table from a table of significant terms. This index is essential for features like autocomplete and search joins, as it pre-processes terms to optimize search performance. It can operate on specific fields and supports multiple languages for tokenization.
| Parameter | Description |
|---|---|
| casOut | Specifies the output table to store the term index. This table will contain the indexed terms and their associated data, ready for use by other search analytics actions. This parameter is required. |
| fields | Specifies a list of columns from the input table that contain the terms to be indexed. If not specified, the action might use a default column, typically `_Term_`. |
| language | Specifies the language for tokenization, which breaks down text into individual terms. This affects how terms are processed and indexed. The default is 'UNIVERSAL' for language-independent tokenization. |
| table | Specifies the input table containing the significant terms to be indexed. This table is typically the output of the `significantTerms` action. This parameter is required. Alias: `index`. |
| tokenize | When set to TRUE, the action tokenizes the content of the specified `fields`. If FALSE, it assumes the fields already contain single, ready-to-index terms. Default: FALSE. |
This SAS code creates a sample CAS table named 'significant_terms' in the 'casuser' caslib. This table simulates the output of a `significantTerms` action and contains a list of terms that will be indexed by the `buildTermIndex` action.
| 1 | DATA casuser.significant_terms; |
| 2 | LENGTH _Term_ $ 50; |
| 3 | INFILE DATALINES dsd; |
| 4 | INPUT _Term_ $; |
| 5 | DATALINES; |
| 6 | SAS Viya |
| 7 | Cloud Analytic Services |
| 8 | Text Analytics |
| 9 | Machine Learning |
| 10 | DATA Science |
| 11 | Search Index |
| 12 | Natural Language Processing |
| 13 | ; |
| 14 | RUN; |
This example demonstrates a basic use of the `buildTermIndex` action. It takes the `significant_terms` table, tokenizes the `_Term_` column, and creates a new output table named `term_index` containing the indexed terms.
| 1 | PROC CAS; |
| 2 | searchAnalytics.buildTermIndex / |
| 3 | TABLE={name='significant_terms'}, |
| 4 | casOut={name='term_index', replace=true}, |
| 5 | fields={'_Term_' |
| 6 | }, |
| 7 | tokenize=true; |
| 8 | RUN; |
| 9 | QUIT; |
This example first creates a sample table with French terms, then shows how to create a term index for a specific language. By setting `language='FRENCH'`, the tokenization and indexing process is optimized for French text. This is crucial for handling language-specific features like stop words, stemming, and character normalization.
| 1 | DATA casuser.significant_terms_french; |
| 2 | LENGTH _Term_ $ 50; |
| 3 | INFILE DATALINES dsd; |
| 4 | INPUT _Term_ $; |
| 5 | DATALINES; |
| 6 | Science des données |
| 7 | Apprentissage automatique |
| 8 | Traitement du langage naturel |
| 9 | Index de recherche |
| 10 | ; |
| 11 | RUN; |
| 12 | |
| 13 | PROC CAS; |
| 14 | searchAnalytics.buildTermIndex / |
| 15 | TABLE={name='significant_terms_french'}, |
| 16 | casOut={name='term_index_french', replace=true}, |
| 17 | fields={'_Term_' |
| 18 | }, |
| 19 | tokenize=true, |
| 20 | language='FRENCH'; |
| 21 | RUN; |
| 22 | QUIT; |
An online bookstore wants to optimize their search engine. They need to create a term index from a list of popular book titles to enable an efficient autocomplete feature. The d...
The IT Operations team needs to index a large volume of system logs to identify recurring error patterns. The test aims to validate the performance and stability of the action w...
A global customer support platform handles tickets in Spanish. The raw data includes special characters, empty fields, and some fields that are already pre-processed tags. The t...