Scénario de test & Cas d'usage
Active learning to optimize data labeling.
Discover all actions of activeLearnCreates a large table of simulated sensor data ('iot_sensor_data') with 100,000 records and a very small table ('failure_annotations') with only a few annotated failure events. The join key is 'event_id'.
| 1 | DATA casuser.iot_sensor_data; |
| 2 | LENGTH event_id $ 20; |
| 3 | DO i = 1 TO 100000; |
| 4 | event_id = 'EVT-' || PUT(i, Z10.); |
| 5 | sensor_a = RAND('NORMAL', 100, 5); |
| 6 | sensor_b = RAND('NORMAL', 50, 2); |
| 7 | OUTPUT; |
| 8 | END; |
| 9 | RUN; |
| 10 | |
| 11 | DATA casuser.failure_annotations; |
| 12 | LENGTH event_id $ 20 failure_code $ 10; |
| 13 | event_id = 'EVT-00045123'; failure_code = 'OVERHEAT'; OUTPUT; |
| 14 | event_id = 'EVT-00078901'; failure_code = 'PRESSURE'; OUTPUT; |
| 15 | RUN; |
| 1 | PROC CAS; |
| 2 | ACTION activeLearn.alJoin / |
| 3 | TABLE={name='iot_sensor_data'}, |
| 4 | annotatedTable={name='failure_annotations'}, |
| 5 | id='event_id', |
| 6 | joinType='INNER', |
| 7 | casOut={name='failure_data_points', replace=true}; |
| 8 | RUN; |
| 9 | QUIT; |
| 1 | PROC CAS; |
| 2 | TABLE.fetch / TABLE={name='failure_data_points'}; |
| 3 | TABLE.rowCount / TABLE={name='failure_data_points'}; |
| 4 | RUN; |
| 5 | QUIT; |
The output table 'failure_data_points' must contain exactly 2 rows. This demonstrates the action's efficiency in using an INNER join to select a tiny subset of data from a very large table based on a small annotation table, which is a critical performance requirement for big data analytics.