Scénario de test & Cas d'usage
Creación de un dataset con casos borde: textos vacíos, nulos y caracteres especiales.
| 1 | DATA casuser.dirty_contracts; LENGTH _doc_id_ $20 _text_ $300; INFILE DATALINES delimiter='|'; INPUT _doc_id_ $ _text_ $; DATALINES; |
| 2 | CTR_001|Contrato válido fecha 2023. |
| 3 | CTR_002| |
| 4 | CTR_003|. |
| 5 | CTR_004|@#$%^&*() Error de escaneo |
| 6 | ; RUN; |
| 1 | |
| 2 | PROC CAS; |
| 3 | TABLE.loadTable / path='legal_label.csv' casOut={name='l_label', replace=true}; |
| 4 | TABLE.loadTable / path='legal_attr.csv' casOut={name='l_attr', replace=true}; |
| 5 | TABLE.loadTable / path='legal_feature.csv' casOut={name='l_feature', replace=true}; |
| 6 | TABLE.loadTable / path='legal_attrfeature.csv' casOut={name='l_attrfeature', replace=true}; |
| 7 | TABLE.loadTable / path='legal_template.csv' casOut={name='l_template', replace=true}; |
| 8 | |
| 9 | RUN; |
| 10 |
| 1 | |
| 2 | PROC CAS; |
| 3 | conditionalRandomFields.crfScore TABLE={name='dirty_contracts'} model={label={name='l_label'}, attr={name='l_attr'}, feature={name='l_feature'}, attrfeature={name='l_attrfeature'}, template={name='l_template'}} casOut={name='contracts_scored', replace=true} target='legal_entity'; |
| 4 | |
| 5 | RUN; |
| 6 |
La acción se ejecuta exitosamente ignorando o etiquetando como 'O' (Outside) los registros vacíos o corruptos (CTR_002, CTR_003), sin provocar una caída del sistema (crash), y etiqueta correctamente el CTR_001.