textRuleScore applyCategory

Standard Case: Categorizing Customer Product Reviews

Scénario de test & Cas d'usage

Business Context

A retail company wants to automatically classify incoming customer reviews from their website into 'Positive', 'Negative', and 'Inquiry' categories. This helps them quickly route feedback to the appropriate department (Customer Service, Product Development, etc.).
About the Set : textRuleScore

Rule-based scoring of text documents.

Discover all actions of textRuleScore
Data Preparation

Create a small dataset of customer reviews and a mock categorization model. The reviews contain typical positive, negative, and questioning language.

Copied!
1DATA mycas.customer_reviews;
2 LENGTH review_id $ 20 review_text $ 500;
3 INFILE DATALINES truncover dsd dlm='|';
4 INPUT review_id $ review_text $;
5 DATALINES;
6PROD_001|The battery life on this new phone is amazing! I highly recommend it.
7PROD_002|I'm very disappointed. The item arrived broken and the packaging was damaged.
8PROD_003|Can you tell me if this product is compatible with model X? I can't find the information.
9PROD_004|Excellent service and fast delivery. Five stars!
10;
11RUN;
12 
13DATA mycas.feedback_model;
14 LENGTH _mco_ long;
15 _mco_ = 112233;
16RUN;

Étapes de réalisation

1
Load the review data and the model into CAS.
Copied!
1PROC CASUTIL;
2 load DATA=WORK.customer_reviews casout='customer_reviews' replace;
3 load DATA=WORK.feedback_model casout='feedback_model' replace;
4QUIT;
2
Run applyCategory using the default 'FREQUENCY' algorithm and generate both the main output and the detailed match output.
Copied!
1PROC CAS;
2 textRuleScore.applyCategory /
3 TABLE={name='customer_reviews'},
4 docId='review_id',
5 text='review_text',
6 model={name='feedback_model'},
7 casOut={name='review_categories', replace=true},
8 matchOut={name='review_matches', replace=true};
9RUN;
10QUIT;

Expected Result


Two tables are created in CAS. 'review_categories' contains the original data with new columns for each category (e.g., 'CAT_Positive', 'CAT_Negative'), with a score indicating the number of rule matches. 'review_matches' provides a detailed log of which specific terms in each review led to a category assignment, allowing for fine-grained analysis of the model's performance.