High-Volume Sensor Anomaly Detection with Clayton Copula

Business Context

A manufacturing plant monitors the health of 500,000 IoT sensors. They need to detect synchronized failures where multiple sensor readings (Temperature, Vibration) drop simultaneously (lower tail dependence). Due to the high volume of data streaming in, the Engineering team needs a fast estimation method (Calibration) rather than the computationally expensive MLE to update the model frequently.

Data Preparation

Generation of a large dataset (500,000 observations) representing sensor readings with induced lower-tail dependence.

Copied!

1
2	DATA mycas.sensor_data;
3	call streaminit(99);
4	DO i = 1 to 500000;
5	u = rand('Uniform');
6	v = rand('Uniform');
7	theta = 2;
8	IF u > 0 THEN DO;
9	t = (-log(u))**(1/theta);
10	temp = (1 + t)**(-1/theta);
11	vib = (1 + t + (-log(v))(1/theta))(-1/theta);
12	OUTPUT;
13	END;
14	END;
15
16	RUN;
17

Étapes de réalisation

Execute the copula fit using the 'CLAYTON' type (sensitive to lower tails) and the 'CAL' (Calibration) method for performance efficiency on the large dataset.

Copied!

1
2	PROC CAS;
3	copula.copulaFit / TABLE={name='sensor_data'}, var={'temp', 'vib'}, copulatype='CLAYTON', method='CAL', timingReport={summary=true};
4
5	RUN;
6
7	QUIT;
8

Expected Result

The model fits significantly faster than MLE. The output provides the Theta parameter indicative of the lower tail dependence. The Timing Report confirms the efficiency of the calibration method on the large dataset.

Voir la documentation technique de copulaFit