Scénario de test & Cas d'usage
Creation of a dataset with edge cases: empty strings, null values, and words not in the attribute table.
| 1 | DATA mycas.dirty_tweets; |
| 2 | LENGTH tweet_id $10 text $100; |
| 3 | INFILE DATALINES delimiter='|' missover; |
| 4 | INPUT tweet_id $ text $; |
| 5 | DATALINES; |
| 6 | TW01| |
| 7 | TW02|. |
| 8 | TW03|Supercalifragilisticexpialidocious |
| 9 | TW04|Standard text |
| 10 | ; |
| 11 | RUN; |
| 12 | |
| 13 | /* Minimal model tables */ |
| 14 | DATA mycas.e_label; LENGTH _label_ $20 _type_ $20; INPUT _label_ $ _type_ $; DATALINES; O OTHER; RUN; |
| 15 | DATA mycas.e_attr; LENGTH _attr_ $50 _value_ $50; INPUT _attr_ $ _value_ $; DATALINES; WORD[0] Standard; RUN; |
| 16 | DATA mycas.e_feat; LENGTH _feature_ $50; INPUT _feature_ $; DATALINES; U00:Standard; RUN; |
| 17 | DATA mycas.e_attr_feat; INPUT _attrid_ _featureid_ _weight_; DATALINES; 1 1 1.0; RUN; |
| 18 | DATA mycas.e_temp; LENGTH _template_ $100; INPUT _template_ $; DATALINES; U00:%w[0]; RUN; |
| 1 | PROC CAS; |
| 2 | conditionalRandomFields.crfScore |
| 3 | TABLE={name='dirty_tweets'}, |
| 4 | model={ |
| 5 | attr={name='e_attr'}, |
| 6 | attrfeature={name='e_attr_feat'}, |
| 7 | feature={name='e_feat'}, |
| 8 | label={name='e_label'}, |
| 9 | template={name='e_temp'} |
| 10 | }, |
| 11 | casOut={name='dirty_scored', replace=true}, |
| 12 | target='tags'; |
| 13 | RUN; |
The action completes without error. Empty texts generate empty or 'O' (Other) sequences depending on model logic. Unknown words (like 'Supercalifragilistic...') are handled gracefully (usually tagged as 'O' or default label) without crashing the session.