Standard Case: Categorizing Customer Product Reviews

Business Context

A retail company wants to automatically classify incoming customer reviews from their website into 'Positive', 'Negative', and 'Inquiry' categories. This helps them quickly route feedback to the appropriate department (Customer Service, Product Development, etc.).

About the Set : textRuleScore

Rule-based scoring of text documents.

Discover all actions of textRuleScore

Data Preparation

Create a small dataset of customer reviews and a mock categorization model. The reviews contain typical positive, negative, and questioning language.

Copied!

1	DATA mycas.customer_reviews;
2	LENGTH review_id $ 20 review_text $ 500;
3	INFILE DATALINES truncover dsd dlm='\|';
4	INPUT review_id $ review_text $;
5	DATALINES;
6	PROD_001\|The battery life on this new phone is amazing! I highly recommend it.
7	PROD_002\|I'm very disappointed. The item arrived broken and the packaging was damaged.
8	PROD_003\|Can you tell me if this product is compatible with model X? I can't find the information.
9	PROD_004\|Excellent service and fast delivery. Five stars!
10	;
11	RUN;
12
13	DATA mycas.feedback_model;
14	LENGTH _mco_ long;
15	_mco_ = 112233;
16	RUN;

Étapes de réalisation

Load the review data and the model into CAS.

Copied!

1	PROC CASUTIL;
2	load DATA=WORK.customer_reviews casout='customer_reviews' replace;
3	load DATA=WORK.feedback_model casout='feedback_model' replace;
4	QUIT;

Run applyCategory using the default 'FREQUENCY' algorithm and generate both the main output and the detailed match output.

Copied!

1	PROC CAS;
2	textRuleScore.applyCategory /
3	TABLE={name='customer_reviews'},
4	docId='review_id',
5	text='review_text',
6	model={name='feedback_model'},
7	casOut={name='review_categories', replace=true},
8	matchOut={name='review_matches', replace=true};
9	RUN;
10	QUIT;

Expected Result

Two tables are created in CAS. 'review_categories' contains the original data with new columns for each category (e.g., 'CAT_Positive', 'CAT_Negative'), with a score indicating the number of rule matches. 'review_matches' provides a detailed log of which specific terms in each review led to a category assignment, allowing for fine-grained analysis of the model's performance.

Voir la documentation technique de applyCategory