The `buildTermIndex` action creates a term index table from a table of significant terms. This index is essential for features like autocomplete and search joins, as it pre-processes terms to optimize search performance. It can operate on specific fields and supports multiple languages for tokenization.
| Parameter | Description |
|---|---|
| casOut | Specifies the output table to store the term index. This table will contain the indexed terms and their associated data, ready for use by other search analytics actions. This parameter is required. |
| fields | Specifies a list of columns from the input table that contain the terms to be indexed. If not specified, the action might use a default column, typically `_Term_`. |
| language | Specifies the language for tokenization, which breaks down text into individual terms. This affects how terms are processed and indexed. The default is 'UNIVERSAL' for language-independent tokenization. |
| table | Specifies the input table containing the significant terms to be indexed. This table is typically the output of the `significantTerms` action. This parameter is required. Alias: `index`. |
| tokenize | When set to TRUE, the action tokenizes the content of the specified `fields`. If FALSE, it assumes the fields already contain single, ready-to-index terms. Default: FALSE. |
This SAS code creates a sample CAS table named 'significant_terms' in the 'casuser' caslib. This table simulates the output of a `significantTerms` action and contains a list of terms that will be indexed by the `buildTermIndex` action.
| 1 | DATA casuser.significant_terms; |
| 2 | LENGTH _Term_ $ 50; |
| 3 | INFILE DATALINES dsd; |
| 4 | INPUT _Term_ $; |
| 5 | DATALINES; |
| 6 | SAS Viya |
| 7 | Cloud Analytic Services |
| 8 | Text Analytics |
| 9 | Machine Learning |
| 10 | DATA Science |
| 11 | Search Index |
| 12 | Natural Language Processing |
| 13 | ; |
| 14 | RUN; |
This example demonstrates a basic use of the `buildTermIndex` action. It takes the `significant_terms` table, tokenizes the `_Term_` column, and creates a new output table named `term_index` containing the indexed terms.
| 1 | PROC CAS; |
| 2 | searchAnalytics.buildTermIndex / |
| 3 | TABLE={name='significant_terms'}, |
| 4 | casOut={name='term_index', replace=true}, |
| 5 | fields={'_Term_' |
| 6 | }, |
| 7 | tokenize=true; |
| 8 | RUN; |
| 9 | QUIT; |
This example first creates a sample table with French terms, then shows how to create a term index for a specific language. By setting `language='FRENCH'`, the tokenization and indexing process is optimized for French text. This is crucial for handling language-specific features like stop words, stemming, and character normalization.
| 1 | DATA casuser.significant_terms_french; |
| 2 | LENGTH _Term_ $ 50; |
| 3 | INFILE DATALINES dsd; |
| 4 | INPUT _Term_ $; |
| 5 | DATALINES; |
| 6 | Science des données |
| 7 | Apprentissage automatique |
| 8 | Traitement du langage naturel |
| 9 | Index de recherche |
| 10 | ; |
| 11 | RUN; |
| 12 | |
| 13 | PROC CAS; |
| 14 | searchAnalytics.buildTermIndex / |
| 15 | TABLE={name='significant_terms_french'}, |
| 16 | casOut={name='term_index_french', replace=true}, |
| 17 | fields={'_Term_' |
| 18 | }, |
| 19 | tokenize=true, |
| 20 | language='FRENCH'; |
| 21 | RUN; |
| 22 | QUIT; |