conditionalRandomFields crfTrain

Chatbot Intent Detection with Sparse Features and SGD

Scénario de test & Cas d'usage

Business Context

A tech support chatbot needs to understand user intent from short, informal messages. The data contains slang and typos (sparse features), requiring a Stochastic Gradient Descent (SGD) approach for generalization.
Data Preparation

Creation of a dataset with short, informal command sequences involving potential sparse feature occurrences.

Copied!
1DATA casuser.chat_logs; LENGTH _token_ $20 feature_type $10 label $10; INPUT _start_ $ _end_ $ _token_ $ feature_type $ label $; DATALINES;
2BEGIN,WORD,reset,COMMAND,B-ACT
3WORD,END,pwd,OBJECT,I-OBJ
4BEGIN,WORD,wifi,OBJECT,B-OBJ
5WORD,END,broken,STATUS,O
6BEGIN,WORD,help,COMMAND,B-ACT
7WORD,END,me,PRON,O
8; RUN;

Étapes de réalisation

1
Training using the SGD algorithm to handle the variability, with a complex template looking at surrounding context.
Copied!
1 
2PROC CAS;
3conditionalRandomFields.crfTrain TABLE={name='chat_logs', caslib='casuser'} target='label' template='U00:%x[0,0]
4B00:%x[0,0]/%x[1,0]' nloOpts={algorithm='SGD', optmlOpt={maxIters=200, regL2=0.1}} model={label={name='chat_labels', replace=true}, attr={name='chat_attrs', replace=true}, feature={name='chat_features', replace=true}, attrfeature={name='chat_attrfeats', replace=true}, template={name='chat_template', replace=true}};
5 
6RUN;
7 

Expected Result


The training completes using Stochastic Gradient Descent. The complex template (Bigrams B00) is successfully parsed and applied to the short sequences without error, producing a valid model.