textRuleDevelop compileConcept

Japanese Log Analysis with Basic Tokenizer

Scénario de test & Cas d'usage

Business Context

An IT support center in Tokyo analyzes server logs written in Japanese. The standard tokenizer struggles with their specific technical jargon and log formats. They require the 'BASIC' tokenizer to rely on simple whitespace/punctuation delimiters for better accuracy on specific error codes mixed with Japanese characters.
Data Preparation

Creating Japanese rules for error detection, including mixed ASCII and Japanese characters.

Copied!
1DATA casuser.jp_log_rules;
2 LENGTH rule_code $32 rule_logic $200;
3 INFILE DATALINES delimiter='|';
4 INPUT rule_code $ rule_logic $;
5 DATALINES;
6ERR_CRIT|CONCEPT:致命的なエラー
7ERR_WARN|CONCEPT:警告
8ERR_CODE|CONCEPT_RULE:(REGEX)=\"[0-9]{3}-FAIL\"
9;
10RUN;

Étapes de réalisation

1
Compiling the model using the BASIC tokenizer specifically for CJK languages.
Copied!
1PROC CAS;
2 textRuleDevelop.compileConcept /
3 TABLE={caslib="casuser", name="jp_log_rules"},
4 ruleId="rule_code",
5 config="rule_logic",
6 language="JAPANESE",
7 tokenizer="BASIC",
8 casOut={caslib="casuser", name="jp_error_model", replace=TRUE};
9RUN;

Expected Result


The model is successfully compiled using the 'BASIC' tokenizer logic. The system acknowledges the 'JAPANESE' language setting. This ensures that when applied later, the text '致命的なエラー' (Fatal Error) is correctly identified even if surrounded by log timestamps.