textRuleDevelop compileCategory

Multi-Language Support Ticket Classification (Japanese)

Scénario de test & Cas d'usage

Business Context

A global software vendor needs to classify support tickets originating from Japan. The text contains CJK characters which require specific tokenization strategies. This scenario validates the action's ability to handle non-English languages and specific tokenizer settings.
Data Preparation

Creation of a rule table containing Japanese characters for Network and Hardware issues.

Copied!
1 
2DATA mycas.jp_support_rules;
3LENGTH config $200;
4INFILE DATALINES dsd;
5INPUT config $;
6DATALINES;
7"CATEGORY:Network,(OR, 'ネットワーク', '接続', '遅延')" "CATEGORY:Hardware,(OR, 'ハードウェア', '故障', '画面')" ;
8 
9RUN;
10 

Étapes de réalisation

1
Load the Japanese rule definitions.
Copied!
1 
2PROC CAS;
3TABLE.loadTable / path='jp_support_rules.sashdat' caslib='casuser' casout={name='jp_support_rules', replace=true};
4 
5RUN;
6 
2
Compile the category model specifying Japanese language and BASIC tokenizer.
Copied!
1 
2PROC CAS;
3textRuleDevelop.compileCategory / TABLE={name='jp_support_rules'} config='config' language='Japanese' tokenizer='BASIC' casOut={name='jp_support_model', replace=true};
4 
5RUN;
6 

Expected Result


The action must compile the Japanese rules without encoding errors. The resulting 'jp_support_model' should be optimized for CJK text processing due to the 'BASIC' tokenizer setting.