crfScore - WeAreCAS

Q: What is the primary function of the crfScore action?

The crfScore action scores documents against a pre-existing Conditional Random Fields (CRF) model.

Q: Which parameters are required to run the crfScore action?

The required parameters are 'casOut' to specify the output table, 'model' to provide the input model tables, 'table' for the input data, and 'target' to name the predicted variable in the output.

Q: What does the 'model' parameter consist of?

The 'model' parameter specifies the input modeling tables, which include 'attr' (attributes), 'attrfeature' (attribute-feature mapping), 'feature' (features), 'label' (labels), and 'template' (templates).

Q: How is the output table specified?

The output table, which contains the tagged data, is specified using the 'casOut' parameter. You must provide a name for the table and can optionally specify a caslib.

Q: What is the 'target' parameter used for?

The 'target' parameter is used to specify the name of the column that will contain the predicted labels (the hidden variable) in the output table.

Description

The crfScore action uses a Conditional Random Fields (CRF) model to score input data. It performs sequence labeling on the documents, predicting a sequence of labels for a sequence of tokens. This is commonly used for tasks like Named Entity Recognition (NER). The action requires a pre-trained CRF model, which consists of several tables (attributes, features, labels, etc.), and an input table containing the text to be scored.

conditionalRandomFields.crfScore / casOut={<casouttable>} model={<crfmodel>} table={<castable>} target="string";

Settings

Parameter	Description
casOut	Specifies the output CAS table to store the tagged data. This table will contain the original data along with the predicted labels for each token.
model	Specifies the input tables that constitute the trained CRF model. This is a dictionary parameter that must include the 'attr', 'attrfeature', 'feature', 'label', and 'template' tables.
table	Specifies the input CAS table that contains the documents to be scored.
target	Specifies the name of the variable in the output table that will contain the predicted labels (the hidden sequence).

Data Preparation View data prep sheet

Data Creation: Sample Text Data and Model Tables

This example first creates a sample input table 'score_data' with document IDs and text. Then, it simulates the creation of the five required model tables ('crf_attr', 'crf_attr_feature', 'crf_feature', 'crf_label', 'crf_template') that would typically be generated by the 'crfTrain' action. These tables are necessary for the 'crfScore' action to function.

Copied!

1	/* 1. Create sample data to score */
2	DATA mycas.score_data;
3	INFILE DATALINES delimiter='\|';
4	LENGTH docid $ 10 text $ 300;
5	INPUT docid $ text $;
6	DATALINES;
7	1\|John Smith lives in New York.
8	2\|Mary works for SAS Institute.
9	;
10	RUN;
11
12	/* 2. Simulate pre-existing CRF model tables (usually created by crfTrain) */
13
14	/* Label Table */
15	DATA mycas.crf_label;
16	INFILE DATALINES delimiter=',';
17	LENGTH _label_ $20 _type_ $20;
18	INPUT _label_ $ _type_ $;
19	DATALINES;
20	B-PER,PERSON
21	I-PER,PERSON
22	B-ORG,ORGANIZATION
23	I-ORG,ORGANIZATION
24	B-LOC,LOCATION
25	I-LOC,LOCATION
26	O,OTHER
27	;
28	RUN;
29
30	/* Attribute Table */
31	DATA mycas.crf_attr;
32	INFILE DATALINES delimiter=',';
33	LENGTH _attr_ $50 _value_ $50;
34	INPUT _attr_ $ _value_ $;
35	DATALINES;
36	WORD[0],John
37	WORD[0],Smith
38	WORD[0],lives
39	WORD[0],in
40	WORD[0],New
41	WORD[0],York
42	WORD[0],Mary
43	WORD[0],works
44	WORD[0],for
45	WORD[0],SAS
46	WORD[0],Institute
47	;
48	RUN;
49
50	/* Feature Table */
51	DATA mycas.crf_feature;
52	INFILE DATALINES delimiter=',';
53	LENGTH _feature_ $50;
54	INPUT _feature_ $;
55	DATALINES;
56	U01:York
57	U02:New
58	L:B-PER
59	U00:John
60	U00:Smith
61	U00:Mary
62	U00:SAS
63	;
64	RUN;
65
66	/* Attribute-Feature Table */
67	DATA mycas.crf_attr_feature;
68	INFILE DATALINES delimiter=',';
69	INPUT _attrid_ _featureid_ _weight_ ;
70	DATALINES;
71	1 4 1.5
72	2 5 1.6
73	3 1 0.2
74	4 2 0.3
75	5 6 1.7
76	6 7 1.8
77	;
78	RUN;
79
80	/* Template Table */
81	DATA mycas.crf_template;
82	INFILE DATALINES delimiter=',';
83	LENGTH _template_ $100;
84	INPUT _template_ $;
85	DATALINES;
86	U00:%w[0]
87	U01:%w[1]
88	U02:%w[-1]
89	L
90	;
91	RUN;

Examples

This example demonstrates how to use the `crfScore` action to apply a trained Conditional Random Fields model to new data. It specifies the input data table ('score_data'), the set of model tables, and the output table ('crf_scored_output'). The `target` parameter names the new column that will hold the predicted entity labels.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	conditionalRandomFields.crfScore
3	TABLE={name='score_data'},
4	model={
5	attr={name='crf_attr'},
6	attrfeature={name='crf_attr_feature'},
7	feature={name='crf_feature'},
8	label={name='crf_label'},
9	template={name='crf_template'}
10	},
11	casOut={name='crf_scored_output', replace=true},
12	target='predicted_label';
13	RUN;
14
15	/* Display the scored results */
16	PROC PRINT DATA=mycas.crf_scored_output;
17	RUN; QUIT;

Result :
The action generates an output table named 'crf_scored_output' in the 'mycas' caslib. This table includes the original 'docid' and 'text' columns, along with a new column named 'predicted_label' containing the sequence of predicted entity tags for each token in the text.

FAQ

What is the primary function of the crfScore action?

Which parameters are required to run the crfScore action?

What does the 'model' parameter consist of?

How is the output table specified?

What is the 'target' parameter used for?

Associated Scenarios

Use Case

Extraction of Medical Entities from Patient Notes

A hospital wants to automatically structure unstructured patient notes by identifying symptoms and medications using a pre-trained CRF model.

View scenario

Use Case

High Volume Scoring for E-commerce Reviews

An e-commerce platform needs to tag named entities (Product Names, Brands) in thousands of daily customer reviews to analyze trends. Performance and stability on larger datasets...

View scenario

Use Case

Robustness to Missing Values and Unknown Tokens

A social media monitoring tool processes tweets that may be empty, contain special characters, or words not present in the training dictionary (Out-Of-Vocabulary).

View scenario

Actions associées

conditionalRandomFields

crfTrain

The crfTrain action trains a Conditional Random Fields (CRF) model for sequen...

Table of Contents

Description

Data Creation: Sample Text Data and Model Tables

Examples

Scoring Documents with a Trained CRF Model

FAQ

Associated Scenarios

Use Case

Extraction of Medical Entities from Patient Notes

Use Case

High Volume Scoring for E-commerce Reviews

Use Case

Robustness to Missing Values and Unknown Tokens

Actions associées

crfTrain