conditionalRandomFields crfTrain

Standard Named Entity Recognition for E-commerce

Scénario de test & Cas d'usage

Business Context

An online retailer wants to automatically extract product names and brand entities from customer reviews to improve their recommendation engine.
Data Preparation

Creation of a dataset containing tokenized customer reviews with associated features (POS tags, capitalization) and target labels (B-PROD, I-PROD, O).

Copied!
1DATA casuser.retail_reviews; LENGTH _token_ $20 feature_pos $5 feature_cap $5 label $10; INPUT _start_ $ _end_ $ _token_ $ feature_pos $ feature_cap $ label $; DATALINES;
2BEGIN,WORD,Great,ADJ,Cap,O
3WORD,WORD,running,VERB,Low,O
4WORD,END,shoes,NOUN,Low,B-PROD
5BEGIN,WORD,I,PRON,Cap,O
6WORD,WORD,love,VERB,Low,O
7WORD,WORD,my,PRON,Low,O
8WORD,WORD,Nike,NOUN,Cap,B-BRAND
9WORD,END,Air,NOUN,Cap,I-BRAND
10; RUN;

Étapes de réalisation

1
Training the CRF model using a standard Unigram template to identify Brands and Products.
Copied!
1 
2PROC CAS;
3conditionalRandomFields.crfTrain TABLE={name='retail_reviews', caslib='casuser'} target='label' template='U00:%x[0,0]' model={label={name='retail_labels'}, attr={name='retail_attrs'}, feature={name='retail_feats'}, attrfeature={name='retail_attrfeats'}, template={name='retail_tpl'}};
4 
5RUN;
6 

Expected Result


The action should successfully train the model and generate the five output tables (labels, attributes, features, etc.) defined in the model parameter, capturing the relationship between capitalized tokens and Brand labels.