Scénario de test & Cas d'usage
Rule-based scoring of text documents.
Discover all actions of textRuleScoreCreate a 'communications' table with tricky data: overlapping terms, null/empty text, and non-English text. The LITI model ('compliance_liti') defines a generic concept to be dropped and a specific, longer concept to be matched.
| 1 | DATA mycas.communications; |
| 2 | LENGTH msg_id $ 10 msg_text $ 300; |
| 3 | INFILE DATALINES delimiter='|'; |
| 4 | INPUT msg_id $ msg_text $; |
| 5 | DATALINES; |
| 6 | msg1|The team discussed Project Chimera in the meeting. |
| 7 | msg2|This new project is very demanding. |
| 8 | msg3|Let's talk about the Project Chimera funding. |
| 9 | msg4| |
| 10 | msg5|Ceci est un test dans une autre langue. |
| 11 | msg6|Just a regular project update. |
| 12 | ; |
| 13 | RUN; |
| 14 | |
| 15 | DATA mycas.compliance_liti; |
| 16 | LENGTH model_id $ 10 model_txt $ 200; |
| 17 | INPUT model_id $ model_txt $; |
| 18 | DATALINES; |
| 19 | comp1|CONCEPT:GENERIC_PROJECT@project |
| 20 | comp1|CONCEPT:RESTRICTED_PROJECT@Project Chimera |
| 21 | ; |
| 22 | RUN; |
| 1 | /* |
| 2 | Data is prepared and loaded in the data_prep step */ |
| 1 | PROC CAS; |
| 2 | textRuleScore.applyConcept / |
| 3 | TABLE={name='communications'}, |
| 4 | docId='msg_id', |
| 5 | text='msg_text', |
| 6 | model={name='compliance_liti'}, |
| 7 | matchType='LONGEST', |
| 8 | dropConcepts={'GENERIC_PROJECT'}, |
| 9 | casOut={name='compliance_hits', replace=true}; |
| 10 | RUN; |
| 11 | QUIT; |
The action runs without errors, ignoring the empty and non-English records. The output table 'compliance_hits' contains exactly two rows, one for 'msg1' and one for 'msg3', both identifying the concept 'RESTRICTED_PROJECT'. The 'GENERIC_PROJECT' concept is not present in the output, demonstrating that both 'matchType=LONGEST' and 'dropConcepts' worked as expected.