conditionalRandomFields crfScore

High Volume Scoring for E-commerce Reviews

Scénario de test & Cas d'usage

Business Context

An e-commerce platform needs to tag named entities (Product Names, Brands) in thousands of daily customer reviews to analyze trends. Performance and stability on larger datasets are key.
Data Preparation

Generation of a dataset containing 10,000 synthetic reviews and reuse of the generic model structure.

Copied!
1DATA mycas.reviews_large;
2 LENGTH review_id $10 content $300;
3 DO i=1 to 10000;
4 review_id=cats('REV', i);
5 content='This product from AcmeCorp is amazing and durable.';
6 OUTPUT;
7 END;
8RUN;
9 
10/* Reusing model structure from previous examples or creating dummy model tables for volume test */
11DATA mycas.v_label; LENGTH _label_ $20 _type_ $20; INPUT _label_ $ _type_ $; DATALINES;
12B-BRAND ORGANIZATION
13O OTHER
14; RUN;
15DATA mycas.v_attr; LENGTH _attr_ $50 _value_ $50; INPUT _attr_ $ _value_ $; DATALINES;
16WORD[0] AcmeCorp
17; RUN;
18DATA mycas.v_feat; LENGTH _feature_ $50; INPUT _feature_ $; DATALINES;
19U00:AcmeCorp
20; RUN;
21DATA mycas.v_attr_feat; INPUT _attrid_ _featureid_ _weight_; DATALINES;
221 1 2.0
23; RUN;
24DATA mycas.v_temp; LENGTH _template_ $100; INPUT _template_ $; DATALINES;
25U00:%w[0]
26; RUN;

Étapes de réalisation

1
Execution of scoring on the volume dataset.
Copied!
1PROC CAS;
2 conditionalRandomFields.crfScore
3 TABLE={name='reviews_large'},
4 model={
5 attr={name='v_attr'},
6 attrfeature={name='v_attr_feat'},
7 feature={name='v_feature'},
8 label={name='v_label'},
9 template={name='v_template'}
10 },
11 casOut={name='reviews_tagged', replace=true},
12 target='entities';
13RUN;

Expected Result


The action executes successfully without timeout. The output table 'reviews_tagged' contains 10,000 rows, with the 'entities' column populated.