Chatbot Intent Detection with Sparse Features and SGD

Business Context

A tech support chatbot needs to understand user intent from short, informal messages. The data contains slang and typos (sparse features), requiring a Stochastic Gradient Descent (SGD) approach for generalization.

Data Preparation

Creation of a dataset with short, informal command sequences involving potential sparse feature occurrences.

Copied!

1	DATA casuser.chat_logs; LENGTH _token_ $20 feature_type $10 label $10; INPUT _start_ $ _end_ $ _token_ $ feature_type $ label $; DATALINES;
2	BEGIN,WORD,reset,COMMAND,B-ACT
3	WORD,END,pwd,OBJECT,I-OBJ
4	BEGIN,WORD,wifi,OBJECT,B-OBJ
5	WORD,END,broken,STATUS,O
6	BEGIN,WORD,help,COMMAND,B-ACT
7	WORD,END,me,PRON,O
8	; RUN;

Étapes de réalisation

Training using the SGD algorithm to handle the variability, with a complex template looking at surrounding context.

Copied!

1
2	PROC CAS;
3	conditionalRandomFields.crfTrain TABLE={name='chat_logs', caslib='casuser'} target='label' template='U00:%x[0,0]
4	B00:%x[0,0]/%x[1,0]' nloOpts={algorithm='SGD', optmlOpt={maxIters=200, regL2=0.1}} model={label={name='chat_labels', replace=true}, attr={name='chat_attrs', replace=true}, feature={name='chat_features', replace=true}, attrfeature={name='chat_attrfeats', replace=true}, template={name='chat_template', replace=true}};
5
6	RUN;
7

Expected Result

The training completes using Stochastic Gradient Descent. The complex template (Bigrams B00) is successfully parsed and applied to the short sequences without error, producing a valid model.

Voir la documentation technique de crfTrain