High Volume Customer Support Analytics

Business Context

A telecommunications company processes 10,000 customer support calls per hour. The data science team needs to ensure the `calculateErrorRate` action can handle batch processing of high-volume short utterances without performance degradation or memory errors.

About the Set : langModel

Management of Large Language Models (LLM) and NLP.

Discover all actions of langModel

Data Preparation

Programmatically generating 10,000 rows of synthetic call logs. The reference data is identical to the hypothesis data for 90% of cases to simulate a high-performing model, with random errors injected into the remaining 10%.

Copied!

1
2	DATA mycas.call_center_truth;
3	LENGTH call_id $20 transcript $100;
4	DO i=1 to 10000;
5	call_id=cats('CALL_', i);
6	transcript='Customer requests cancellation of service plan A';
7	OUTPUT;
8	END;
9
10	RUN;
11
12	DATA mycas.call_center_pred;
13	LENGTH call_id $20 transcript $100;
14	DO i=1 to 10000;
15	call_id=cats('CALL_', i);
16	IF mod(i, 10) = 0 THEN transcript='Customer request cancel service plan A';
17	ELSE transcript='Customer requests cancellation of service plan A';
18	OUTPUT;
19	END;
20
21	RUN;
22

Étapes de réalisation

Ensure tables are loaded into memory.

Copied!

1
2	PROC CAS;
3	TABLE.tableDetails / name='call_center_truth';
4
5	RUN;
6

Run the calculation on the full 10k dataset using default column assumptions (since column names match this time).

Copied!

1
2	PROC CAS;
3	langModel.calculateErrorRate / TABLE={name='call_center_pred'} reference={name='call_center_truth'};
4
5	RUN;
6

Expected Result

The action completes within a reasonable execution time. The aggregate report should reflect exactly 10% of rows having non-zero WER. No system timeouts or memory allocation errors should occur.

Voir la documentation technique de calculateErrorRate