searchAnalytics buildTermIndex

High Volume Server Log Indexing

Scénario de test & Cas d'usage

Business Context

The IT Operations team needs to index a large volume of system logs to identify recurring error patterns. The test aims to validate the performance and stability of the action when processing a larger dataset using the default Universal tokenizer and the table parameter alias.
About the Set : searchAnalytics

Data indexing and search functionalities.

Discover all actions of searchAnalytics
Data Preparation

Generating a larger dataset of simulated server logs (10,000 rows) to test volume.

Copied!
1 
2DATA casuser.large_logs;
3LENGTH message $ 200;
4DO i=1 to 10000;
5IF mod(i, 3)=0 THEN message='Error 500: Internal Server Error';
6ELSE IF mod(i, 3)=1 THEN message='Warning: High Memory Usage detected';
7ELSE message='Info: User login successful';
8OUTPUT;
9END;
10drop i;
11 
12RUN;
13 

Étapes de réalisation

1
Generate the 10,000 row log dataset.
Copied!
1/* Data generation handled in data_prep */
2
Execute buildTermIndex using the 'index' alias for the input table and default language settings.
Copied!
1PROC CAS;
2 searchAnalytics.buildTermIndex /
3 index={name='large_logs'}
4 casOut={name='log_index_vol', replace=true}
5 fields={'message'}
6 tokenize=true;
7RUN;

Expected Result


The action processes the 10,000 rows without errors using the 'index' alias parameter. The output table 'log_index_vol' is populated with terms extracted from the log messages.