langModel calculateErrorRate

Validation of Medical Dictation Accuracy

Scénario de test & Cas d'usage

Business Context

A hospital is evaluating a new Speech-to-Text model for transcribing doctor's notes. The goal is to compare the model's output against manually verified transcripts to ensure the Word Error Rate (WER) is below the acceptable threshold for medical records, handling specific medical terminology.
About the Set : langModel

Management of Large Language Models (LLM) and NLP.

Discover all actions of langModel
Data Preparation

Creation of a reference table with ground truth medical notes and a hypothesis table with simulated model outputs containing typical transcription errors (substitutions and deletions).

Copied!
1 
2DATA mycas.med_ref;
3LENGTH audio_id $15 content $200;
4INPUT audio_id $ content &;
5DATALINES;
6REC001 Patient exhibits signs of acute bronchitis REC002 Prescribed 50mg of Atenolol daily REC003 No history of cardiovascular disease ;
7 
8RUN;
9 
10DATA mycas.med_hyp;
11LENGTH pred_id $15 pred_text $200;
12INPUT pred_id $ pred_text &;
13DATALINES;
14REC001 Patient exhibits signs of acute bronchitis REC002 Prescribed 15mg of Atenolol daily REC003 No history of cardio vascular disease ;
15 
16RUN;
17 

Étapes de réalisation

1
Load data and verify table accessibility in the CAS session.
Copied!
1 
2PROC CAS;
3TABLE.tableInfo / TABLE='med_ref';
4TABLE.tableInfo / TABLE='med_hyp';
5 
6RUN;
7 
2
Execute calculateErrorRate mapping specific custom column names (audio_id/content vs pred_id/pred_text).
Copied!
1 
2PROC CAS;
3langModel.calculateErrorRate / TABLE={name='med_hyp'} reference={name='med_ref'} tableId='pred_id' tableText='pred_text' referenceId='audio_id' referenceText='content';
4 
5RUN;
6 

Expected Result


The action should successfully map the columns despite different names. It must return a CAS result table showing a low error rate for REC001 (perfect match), a substitution error for REC002 (50mg vs 15mg), and potentially an insertion/substitution error for REC003 (cardiovascular vs cardio vascular).