conditionalRandomFields

crfScore

Description

The crfScore action uses a Conditional Random Fields (CRF) model to score input data. It performs sequence labeling on the documents, predicting a sequence of labels for a sequence of tokens. This is commonly used for tasks like Named Entity Recognition (NER). The action requires a pre-trained CRF model, which consists of several tables (attributes, features, labels, etc.), and an input table containing the text to be scored.

conditionalRandomFields.crfScore / casOut={<casouttable>} model={<crfmodel>} table={<castable>} target="string";
Settings
ParameterDescription
casOutSpecifies the output CAS table to store the tagged data. This table will contain the original data along with the predicted labels for each token.
modelSpecifies the input tables that constitute the trained CRF model. This is a dictionary parameter that must include the 'attr', 'attrfeature', 'feature', 'label', and 'template' tables.
tableSpecifies the input CAS table that contains the documents to be scored.
targetSpecifies the name of the variable in the output table that will contain the predicted labels (the hidden sequence).
Data Preparation View data prep sheet
Data Creation: Sample Text Data and Model Tables

This example first creates a sample input table 'score_data' with document IDs and text. Then, it simulates the creation of the five required model tables ('crf_attr', 'crf_attr_feature', 'crf_feature', 'crf_label', 'crf_template') that would typically be generated by the 'crfTrain' action. These tables are necessary for the 'crfScore' action to function.

Copied!
1/* 1. Create sample data to score */
2DATA mycas.score_data;
3 INFILE DATALINES delimiter='|';
4 LENGTH docid $ 10 text $ 300;
5 INPUT docid $ text $;
6 DATALINES;
71|John Smith lives in New York.
82|Mary works for SAS Institute.
9;
10RUN;
11 
12/* 2. Simulate pre-existing CRF model tables (usually created by crfTrain) */
13 
14/* Label Table */
15DATA mycas.crf_label;
16 INFILE DATALINES delimiter=',';
17 LENGTH _label_ $20 _type_ $20;
18 INPUT _label_ $ _type_ $;
19 DATALINES;
20B-PER,PERSON
21I-PER,PERSON
22B-ORG,ORGANIZATION
23I-ORG,ORGANIZATION
24B-LOC,LOCATION
25I-LOC,LOCATION
26O,OTHER
27;
28RUN;
29 
30/* Attribute Table */
31DATA mycas.crf_attr;
32 INFILE DATALINES delimiter=',';
33 LENGTH _attr_ $50 _value_ $50;
34 INPUT _attr_ $ _value_ $;
35 DATALINES;
36WORD[0],John
37WORD[0],Smith
38WORD[0],lives
39WORD[0],in
40WORD[0],New
41WORD[0],York
42WORD[0],Mary
43WORD[0],works
44WORD[0],for
45WORD[0],SAS
46WORD[0],Institute
47;
48RUN;
49 
50/* Feature Table */
51DATA mycas.crf_feature;
52 INFILE DATALINES delimiter=',';
53 LENGTH _feature_ $50;
54 INPUT _feature_ $;
55 DATALINES;
56U01:York
57U02:New
58L:B-PER
59U00:John
60U00:Smith
61U00:Mary
62U00:SAS
63;
64RUN;
65 
66/* Attribute-Feature Table */
67DATA mycas.crf_attr_feature;
68 INFILE DATALINES delimiter=',';
69 INPUT _attrid_ _featureid_ _weight_ ;
70 DATALINES;
711 4 1.5
722 5 1.6
733 1 0.2
744 2 0.3
755 6 1.7
766 7 1.8
77;
78RUN;
79 
80/* Template Table */
81DATA mycas.crf_template;
82 INFILE DATALINES delimiter=',';
83 LENGTH _template_ $100;
84 INPUT _template_ $;
85 DATALINES;
86U00:%w[0]
87U01:%w[1]
88U02:%w[-1]
89L
90;
91RUN;

Examples

This example demonstrates how to use the `crfScore` action to apply a trained Conditional Random Fields model to new data. It specifies the input data table ('score_data'), the set of model tables, and the output table ('crf_scored_output'). The `target` parameter names the new column that will hold the predicted entity labels.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 conditionalRandomFields.crfScore
3 TABLE={name='score_data'},
4 model={
5 attr={name='crf_attr'},
6 attrfeature={name='crf_attr_feature'},
7 feature={name='crf_feature'},
8 label={name='crf_label'},
9 template={name='crf_template'}
10 },
11 casOut={name='crf_scored_output', replace=true},
12 target='predicted_label';
13RUN;
14 
15/* Display the scored results */
16PROC PRINT DATA=mycas.crf_scored_output;
17RUN; QUIT;
Result :
The action generates an output table named 'crf_scored_output' in the 'mycas' caslib. This table includes the original 'docid' and 'text' columns, along with a new column named 'predicted_label' containing the sequence of predicted entity tags for each token in the text.

FAQ

What is the primary function of the crfScore action?
Which parameters are required to run the crfScore action?
What does the 'model' parameter consist of?
How is the output table specified?
What is the 'target' parameter used for?

Associated Scenarios

Use Case
Extraction of Medical Entities from Patient Notes

A hospital wants to automatically structure unstructured patient notes by identifying symptoms and medications using a pre-trained CRF model.

Use Case
High Volume Scoring for E-commerce Reviews

An e-commerce platform needs to tag named entities (Product Names, Brands) in thousands of daily customer reviews to analyze trends. Performance and stability on larger datasets...

Use Case
Robustness to Missing Values and Unknown Tokens

A social media monitoring tool processes tweets that may be empty, contain special characters, or words not present in the training dictionary (Out-Of-Vocabulary).