Scénario de test & Cas d'usage
Extraction of Boolean rules for classification.
Discover all actions of boolRuleSimulation of 50,000 documents with random term occurrences to stress the rule extraction algorithm.
| 1 | |
| 2 | DATA casuser.email_terms; |
| 3 | DO email_id=1 to 50000; |
| 4 | DO t=1 to 5; |
| 5 | term_code=int(rand('uniform')*500); |
| 6 | OUTPUT; |
| 7 | END; |
| 8 | END; |
| 9 | |
| 10 | RUN; |
| 11 | |
| 12 | DATA casuser.email_info; |
| 13 | DO email_id=1 to 50000; |
| 14 | IF rand('uniform')>0.8 THEN label='SPAM'; |
| 15 | ELSE label='LEGIT'; |
| 16 | OUTPUT; |
| 17 | END; |
| 18 | |
| 19 | RUN; |
| 20 |
| 1 | |
| 2 | PROC CAS; |
| 3 | boolRule.brTrain / TABLE={name='email_terms'} docId='email_id' termId='term_code' docInfo={TABLE={name='email_info'}, id='email_id', targets={'label'}} maxCandidates=1000 nThreads=4 casOut={name='spam_rules', replace=true}; |
| 4 | |
| 5 | RUN; |
| 6 |
The action completes within a reasonable time frame, utilizing multiple threads, and produces a rule set even with a high volume of noisy input data.