conditionalRandomFields crfScore

Extraction of Medical Entities from Patient Notes

Scénario de test & Cas d'usage

Business Context

A hospital wants to automatically structure unstructured patient notes by identifying symptoms and medications using a pre-trained CRF model.
Data Preparation

Creation of patient notes and simulation of the 5 required CRF model tables (Labels, Attributes, Features, Attr-Features, Templates) adapted for medical terms.

Copied!
1DATA mycas.med_docs; LENGTH doc_id $10 text $200; INFILE DATALINES delimiter='|'; INPUT doc_id $ text $; DATALINES;
2DOC1|Patient takes Aspirin for headache.
3DOC2|No allergies reported.
4;
5RUN;
6 
7/* Simulate Model Tables */
8DATA mycas.m_label; LENGTH _label_ $20 _type_ $20; INPUT _label_ $ _type_ $; DATALINES;
9B-DRUG MEDICATION
10I-DRUG MEDICATION
11B-SYMP SYMPTOM
12O OTHER
13; RUN;
14 
15DATA mycas.m_attr; LENGTH _attr_ $50 _value_ $50; INPUT _attr_ $ _value_ $; DATALINES;
16WORD[0] Aspirin
17WORD[0] headache
18WORD[0] Patient
19; RUN;
20 
21DATA mycas.m_feat; LENGTH _feature_ $50; INPUT _feature_ $; DATALINES;
22U00:Aspirin
23U00:headache
24; RUN;
25 
26DATA mycas.m_attr_feat; INPUT _attrid_ _featureid_ _weight_; DATALINES;
271 1 1.5
282 2 1.2
29; RUN;
30 
31DATA mycas.m_temp; LENGTH _template_ $100; INPUT _template_ $; DATALINES;
32U00:%w[0]
33; RUN;

Étapes de réalisation

1
Scoring the patient notes using the medical CRF model.
Copied!
1PROC CAS;
2 conditionalRandomFields.crfScore
3 TABLE={name='med_docs'},
4 model={
5 attr={name='m_attr'},
6 attrfeature={name='m_attr_feat'},
7 feature={name='m_feat'},
8 label={name='m_label'},
9 template={name='m_temp'}
10 },
11 casOut={name='med_scored', replace=true},
12 target='ner_tags';
13RUN;
2
Verification of the results.
Copied!
1PROC PRINT DATA=mycas.med_scored; RUN;

Expected Result


The 'med_scored' table is created. It contains the original text and a new column 'ner_tags' where 'Aspirin' is tagged as B-DRUG and 'headache' as B-SYMP (based on the simulated weights).