Standard Named Entity Recognition for E-commerce

Business Context

An online retailer wants to automatically extract product names and brand entities from customer reviews to improve their recommendation engine.

Data Preparation

Creation of a dataset containing tokenized customer reviews with associated features (POS tags, capitalization) and target labels (B-PROD, I-PROD, O).

Copied!

1	DATA casuser.retail_reviews; LENGTH _token_ $20 feature_pos $5 feature_cap $5 label $10; INPUT _start_ $ _end_ $ _token_ $ feature_pos $ feature_cap $ label $; DATALINES;
2	BEGIN,WORD,Great,ADJ,Cap,O
3	WORD,WORD,running,VERB,Low,O
4	WORD,END,shoes,NOUN,Low,B-PROD
5	BEGIN,WORD,I,PRON,Cap,O
6	WORD,WORD,love,VERB,Low,O
7	WORD,WORD,my,PRON,Low,O
8	WORD,WORD,Nike,NOUN,Cap,B-BRAND
9	WORD,END,Air,NOUN,Cap,I-BRAND
10	; RUN;

Étapes de réalisation

Training the CRF model using a standard Unigram template to identify Brands and Products.

Copied!

1
2	PROC CAS;
3	conditionalRandomFields.crfTrain TABLE={name='retail_reviews', caslib='casuser'} target='label' template='U00:%x[0,0]' model={label={name='retail_labels'}, attr={name='retail_attrs'}, feature={name='retail_feats'}, attrfeature={name='retail_attrfeats'}, template={name='retail_tpl'}};
4
5	RUN;
6

Expected Result

The action should successfully train the model and generate the five output tables (labels, attributes, features, etc.) defined in the model parameter, capturing the relationship between capitalized tokens and Brand labels.

Voir la documentation technique de crfTrain