spc boxChart

Edge Case: Handling Missing Data and Varying Subgroup Sizes

Scénario de test & Cas d'usage

Business Context

A bottling plant monitors the fill volume of juice bottles. Due to intermittent sensor failures, some measurements are missing. Additionally, for quality checks, the number of bottles pulled for each subgroup (a 15-minute interval) can vary. The quality team needs a robust chart that can handle this imperfect data, using median-based statistics which are less sensitive to outliers.
About the Set : spc

Statistical Process Control (control charts).

Discover all actions of spc
Data Preparation

Create a table 'BottleVolumes' where subgroup sizes vary. Some subgroups will have fewer than the nominal size of 8. One subgroup will be entirely missing. Use a DO loop that sometimes skips records.

Copied!
1DATA mycas.BottleVolumes;
2 DO Interval = 1 to 30;
3 /* Skip an entire subgroup */
4 IF Interval = 12 THEN continue;
5 /* Create varying sample sizes */
6 SampleSize = floor(5 + 4 * ranuni(7));
7 DO i = 1 to SampleSize;
8 /* Skip individual measurements */
9 IF ranuni(7) < 0.1 THEN continue;
10 Volume = 500 + 1.5 * rannor(987);
11 OUTPUT;
12 END;
13 END;
14RUN;

Étapes de réalisation

1
Run the boxChart action with parameters to handle imperfect data. 'controlStat=MEDIAN' and 'medCentral=MEDMED' use median-based estimates. 'sMethod=RMVLUE' estimates stdev from ranges, which is robust. 'allN=True' ensures all subgroups are used regardless of size. 'testNStd=True' adapts special cause tests for varying N.
Copied!
1PROC CAS;
2 spc.boxChart /
3 TABLE={name='BottleVolumes'},
4 processValue='Volume',
5 subgroupValue='Interval',
6 controlStat='MEDIAN',
7 medCentral='MEDMED',
8 sMethod='RMVLUE',
9 allN=true,
10 testNStd=true,
11 primaryTests={test1=true, test5=true},
12 chartsTable={name='VolumeAnalysis', replace=true};
13RUN;
2
Check the generated chart summary table to ensure all existing subgroups were processed and that statistics were calculated for each.
Copied!
1PROC CAS;
2 SIMPLE.summary /
3 TABLE={name='VolumeAnalysis'}
4 inputs={{name='_SUBSIZE_'}}
5 subSet={'MIN', 'MAX', 'N'};
6RUN;
3
Verify that the number of subgroups in the output table is 29, confirming that the entirely missing subgroup (Interval=12) was skipped but all others were included.
Copied!
1 
2PROC CAS;
3TABLE.tableInfo / TABLE={name='VolumeAnalysis'};
4RUN;
5 

Expected Result


The action should run without errors, demonstrating its ability to handle varying subgroup sizes and missing data points. The 'VolumeAnalysis' output table should contain results for 29 subgroups. The summary statistics should show that '_SUBSIZE_' varies. The control limits calculated will be based on medians and ranges, providing a robust analysis despite the data quality issues. The total number of rows in the output table should correspond to the 29 processed subgroups.