Performance Case: Analyzing Missing Sensor Readings in High-Volume IoT Data

Business Context

A manufacturing plant uses thousands of IoT sensors to monitor equipment. Some sensors intermittently fail to report data. The goal is to analyze these missing data patterns at scale to identify potentially failing sensor models or locations, without overwhelming the system's memory for distinct value counts.

About the Set : dataSciencePilot

Automated Machine Learning (AutoML) and pipeline generation.

Discover all actions of dataSciencePilot

Data Preparation

Creation of a summarized IoT sensor reading dataset. 'sensor_id' has high cardinality. 'reading_count' will be used as a frequency weight. 'temperature' and 'pressure' have missing values.

Copied!

1	DATA casuser.iot_sensor_logs;
2	LENGTH sensor_id $ 15. location $ 10.;
3	INPUT sensor_id $ location $ temperature pressure reading_count;
4	CARDS;
5	SENSOR_A-001 Assembly1 25.5 101.2 1500
6	SENSOR_B-734 Assembly1 . 101.5 50
7	SENSOR_C-109 Painting2 30.1 . 800
8	SENSOR_A-002 Assembly1 25.6 101.3 2000
9	SENSOR_D-500 Painting2 . . 25
10	SENSOR_B-735 Assembly1 28.2 101.9 1200
11	;
12	RUN;

Étapes de réalisation

Load the summarized sensor data into CAS.

Copied!

1	PROC CASUTIL;
2	load DATA=casuser.iot_sensor_logs outcaslib='casuser' casout='iot_sensor_logs' replace;
3	RUN;
4	QUIT;

Run the analysis using 'reading_count' as a frequency variable. Set a low 'distinctCountLimit' to force the use of the Misra-Gries algorithm for the high-cardinality 'sensor_id' variable.

Copied!

1	PROC CAS;
2	dataSciencePilot.analyzeMissingPatterns /
3	TABLE={name='iot_sensor_logs', caslib='casuser'},
4	inputs={{name='sensor_id'}, {name='location'}, {name='temperature'}, {name='pressure'}},
5	nominals={'sensor_id', 'location'},
6	freq='reading_count',
7	distinctCountLimit=100,
8	misraGries=TRUE,
9	casOut={name='iot_missing_perf_test', caslib='casuser', replace=true};
10	RUN;
11	QUIT;

Expected Result

The action should complete successfully despite the high cardinality of 'sensor_id' relative to the 'distinctCountLimit', by leveraging the Misra-Gries algorithm. The 'MissingCounts' table in the output should show weighted counts and percentages based on the 'reading_count' variable. The pattern where both 'temperature' and 'pressure' are missing (representing total sensor failure) should have a weighted count of 25.

Voir la documentation technique de analyzeMissingPatterns