Scénario de test & Cas d'usage
Automated Machine Learning (AutoML) and pipeline generation.
Discover all actions of dataSciencePilotCreation of a summarized IoT sensor reading dataset. 'sensor_id' has high cardinality. 'reading_count' will be used as a frequency weight. 'temperature' and 'pressure' have missing values.
| 1 | DATA casuser.iot_sensor_logs; |
| 2 | LENGTH sensor_id $ 15. location $ 10.; |
| 3 | INPUT sensor_id $ location $ temperature pressure reading_count; |
| 4 | CARDS; |
| 5 | SENSOR_A-001 Assembly1 25.5 101.2 1500 |
| 6 | SENSOR_B-734 Assembly1 . 101.5 50 |
| 7 | SENSOR_C-109 Painting2 30.1 . 800 |
| 8 | SENSOR_A-002 Assembly1 25.6 101.3 2000 |
| 9 | SENSOR_D-500 Painting2 . . 25 |
| 10 | SENSOR_B-735 Assembly1 28.2 101.9 1200 |
| 11 | ; |
| 12 | RUN; |
| 1 | PROC CASUTIL; |
| 2 | load DATA=casuser.iot_sensor_logs outcaslib='casuser' casout='iot_sensor_logs' replace; |
| 3 | RUN; |
| 4 | QUIT; |
| 1 | PROC CAS; |
| 2 | dataSciencePilot.analyzeMissingPatterns / |
| 3 | TABLE={name='iot_sensor_logs', caslib='casuser'}, |
| 4 | inputs={{name='sensor_id'}, {name='location'}, {name='temperature'}, {name='pressure'}}, |
| 5 | nominals={'sensor_id', 'location'}, |
| 6 | freq='reading_count', |
| 7 | distinctCountLimit=100, |
| 8 | misraGries=TRUE, |
| 9 | casOut={name='iot_missing_perf_test', caslib='casuser', replace=true}; |
| 10 | RUN; |
| 11 | QUIT; |
The action should complete successfully despite the high cardinality of 'sensor_id' relative to the 'distinctCountLimit', by leveraging the Misra-Gries algorithm. The 'MissingCounts' table in the output should show weighted counts and percentages based on the 'reading_count' variable. The pattern where both 'temperature' and 'pressure' are missing (representing total sensor failure) should have a weighted count of 25.