aggregation aggregate

High-Volume IoT Sensor Data Aggregation for Predictive Maintenance

Scénario de test & Cas d'usage

Business Context

A manufacturing plant monitors thousands of machines using IoT sensors that stream data every second. To prevent failures, they need to analyze this high-volume data in near real-time. The task is to aggregate sensor readings (temperature, pressure) every minute for each production line and machine, calculating the max and average values to detect anomalies.
About the Set : aggregation

Data aggregation and summary statistical calculations.

Discover all actions of aggregation
Data Preparation

Creation of a large-scale IoT sensor data table. This simulates 5 production lines with 10 machines each, generating data every second for one hour.

Copied!
1DATA casuser.iot_sensor_data (bufsize=1m);
2LENGTH LineID $ 5 MachineID $ 10;
3FORMAT Timestamp datetime20.;
4DO line = 1 to 5;
5 LineID = 'L' || left(put(line, 1.));
6 DO machine = 1 to 10;
7 MachineID = 'M' || left(put(machine, 2. -l));
8 DO second = 0 to 3599;
9 Timestamp = dhms(today(), 0, 0, 0) + second;
10 Temperature = 80 + (rannor(12345) * 5) + (line*2) + (machine*0.5);
11 Pressure = 1000 + (rannor(54321) * 50) + (line*10);
12 OUTPUT;
13 END;
14 END;
15END;
16RUN;

Étapes de réalisation

1
Load the large IoT data table into CAS.
Copied!
1 
2PROC CASUTIL;
3load
4DATA=casuser.iot_sensor_data outcaslib='casuser' casout='iot_sensor_data' replace;
5QUIT;
6 
2
Run aggregation to get minute-level statistics for temperature and pressure, grouped by production line and machine.
Copied!
1PROC CAS;
2 aggregation.aggregate /
3 TABLE={name='iot_sensor_data', groupBy={'LineID', 'MachineID'}},
4 id='Timestamp',
5 interval='MINUTE',
6 varSpecs={{name='Temperature', agg='max mean'},
7 {name='Pressure', agg='max mean'}},
8 casOut={name='minute_kpis', caslib='casuser', replace=true};
9RUN;
10QUIT;
3
Check the number of rows in the output to ensure aggregation was performed correctly across all groups and time intervals.
Copied!
1PROC CAS;
2 TABLE.numRows RESULT=r / TABLE={name='minute_kpis'};
3 PRINT r;
4RUN;
5QUIT;

Expected Result


The action should efficiently process the large dataset (180,000 rows). The resulting 'minute_kpis' table should have 3,000 rows (5 lines * 10 machines * 60 minutes). Each row will provide the max and mean Temperature and Pressure for a specific machine during a specific minute, allowing for trend analysis and alerting.