Performance Case: High-Volume Server Log Analysis with Pre-Parsing

Business Context

An IT operations team needs to process a massive stream of server logs to detect critical error patterns. To meet real-time monitoring needs, the process must be highly performant and memory-efficient, even with very long log entries.

About the Set : textRuleScore

Rule-based scoring of text documents.

Discover all actions of textRuleScore

Data Preparation

Create a large table of server logs ('server_logs_large') with long, complex messages. Also, create a simple LITI model ('error_codes_liti') to identify specific error codes.

Copied!

1	DATA mycas.server_logs_large;
2	LENGTH log_id 8 log_message $ 2048;
3	DO log_id = 1 to 100000;
4	IF mod(log_id, 100) = 0 THEN log_message = 'CRITICAL FAILURE: Core dump initiated for process ID ' \|\| put(log_id, 8.) \|\| '. System halt imminent. Error code: SYS-001-FATAL. Check memory allocation and disk space immediately. Full trace written to /var/log/dump.log';
5	ELSE log_message = 'INFO: User session ' \|\| put(rand('integer', 1000, 9999), 4.) \|\| ' completed successfully. Execution time: ' \|\| put(rand('uniform')*10, 4.2) \|\| 's.';
6	OUTPUT;
7	END;
8	RUN;
9
10	DATA mycas.error_codes_liti;
11	LENGTH model_id $ 10 model_txt $ 200;
12	INPUT model_id $ model_txt $;
13	DATALINES;
14	errors\|CONCEPT:ERROR_CODE@SYS-001-FATAL
15	;
16	RUN;

Étapes de réalisation

First run: Process the logs to create a pre-parsed table for future use. Use a small 'litiChunkSize' to simulate efficient memory handling with large documents.

Copied!

1	PROC CAS;
2	textRuleScore.applyConcept /
3	TABLE={name='server_logs_large'},
4	docId='log_id',
5	text='log_message',
6	model={name='error_codes_liti'},
7	litiChunkSize='16K',
8	parseTableOut={name='parsed_logs', replace=true};
9	RUN;
10	QUIT;

Second run: Execute the concept extraction again, but this time using the pre-parsed table as input to significantly speed up the process.

Copied!

1	PROC CAS;
2	textRuleScore.applyConcept /
3	TABLE={name='server_logs_large'},
4	docId='log_id',
5	text='log_message',
6	model={name='error_codes_liti'},
7	parseTableIn={name='parsed_logs'},
8	casOut={name='log_errors_fast', replace=true};
9	RUN;
10	QUIT;

Expected Result

The first step creates the 'parsed_logs' table. The second step runs much faster than the first and creates the 'log_errors_fast' table. This table contains 1,000 rows, each identifying the 'SYS-001-FATAL' error code from the critical log entries. The test validates the performance optimization workflow.

Voir la documentation technique de applyConcept