searchAnalytics buildTermIndex

E-commerce Product Search Autocomplete

Scénario de test & Cas d'usage

Business Context

An online bookstore wants to optimize their search engine. They need to create a term index from a list of popular book titles to enable an efficient autocomplete feature. The data contains English titles, and the system must handle tokenization correctly for this language.
About the Set : searchAnalytics

Data indexing and search functionalities.

Discover all actions of searchAnalytics
Data Preparation

Creation of a dataset containing popular book titles.

Copied!
1DATA casuser.books; LENGTH title $ 100; INFILE DATALINES dsd; INPUT title $; DATALINES;
2The Great Gatsby
3Introduction to Algorithms
4Clean Code
5DATA Science for Business
6The Pragmatic Programmer
7; RUN;

Étapes de réalisation

1
Load the book titles into a CAS table.
Copied!
1/* Data loaded in data_prep step */
2
Build the term index for the 'title' column using English language settings and forced tokenization.
Copied!
1PROC CAS;
2 searchAnalytics.buildTermIndex /
3 TABLE={name='books'}
4 casOut={name='book_index', replace=true}
5 fields={'title'}
6 tokenize=true
7 language='ENGLISH';
8RUN;

Expected Result


The 'book_index' table is successfully created. It contains individual terms from the book titles (e.g., 'Gatsby', 'Algorithms', 'Science'), tokenized according to English linguistic rules.