textRuleScore applyConcept

Performance Case: High-Volume Server Log Analysis with Pre-Parsing

Scénario de test & Cas d'usage

Business Context

An IT operations team needs to process a massive stream of server logs to detect critical error patterns. To meet real-time monitoring needs, the process must be highly performant and memory-efficient, even with very long log entries.
About the Set : textRuleScore

Rule-based scoring of text documents.

Discover all actions of textRuleScore
Data Preparation

Create a large table of server logs ('server_logs_large') with long, complex messages. Also, create a simple LITI model ('error_codes_liti') to identify specific error codes.

Copied!
1DATA mycas.server_logs_large;
2 LENGTH log_id 8 log_message $ 2048;
3 DO log_id = 1 to 100000;
4 IF mod(log_id, 100) = 0 THEN log_message = 'CRITICAL FAILURE: Core dump initiated for process ID ' || put(log_id, 8.) || '. System halt imminent. Error code: SYS-001-FATAL. Check memory allocation and disk space immediately. Full trace written to /var/log/dump.log';
5 ELSE log_message = 'INFO: User session ' || put(rand('integer', 1000, 9999), 4.) || ' completed successfully. Execution time: ' || put(rand('uniform')*10, 4.2) || 's.';
6 OUTPUT;
7 END;
8RUN;
9 
10DATA mycas.error_codes_liti;
11 LENGTH model_id $ 10 model_txt $ 200;
12 INPUT model_id $ model_txt $;
13 DATALINES;
14errors|CONCEPT:ERROR_CODE@SYS-001-FATAL
15;
16RUN;

Étapes de réalisation

1
First run: Process the logs to create a pre-parsed table for future use. Use a small 'litiChunkSize' to simulate efficient memory handling with large documents.
Copied!
1PROC CAS;
2 textRuleScore.applyConcept /
3 TABLE={name='server_logs_large'},
4 docId='log_id',
5 text='log_message',
6 model={name='error_codes_liti'},
7 litiChunkSize='16K',
8 parseTableOut={name='parsed_logs', replace=true};
9RUN;
10QUIT;
2
Second run: Execute the concept extraction again, but this time using the pre-parsed table as input to significantly speed up the process.
Copied!
1PROC CAS;
2 textRuleScore.applyConcept /
3 TABLE={name='server_logs_large'},
4 docId='log_id',
5 text='log_message',
6 model={name='error_codes_liti'},
7 parseTableIn={name='parsed_logs'},
8 casOut={name='log_errors_fast', replace=true};
9RUN;
10QUIT;

Expected Result


The first step creates the 'parsed_logs' table. The second step runs much faster than the first and creates the 'log_errors_fast' table. This table contains 1,000 rows, each identifying the 'SYS-001-FATAL' error code from the critical log entries. The test validates the performance optimization workflow.