Edge Case: Handling Missing Data and Varying Subgroup Sizes
Scénario de test & Cas d'usage
Business Context
A bottling plant monitors the fill volume of juice bottles. Due to intermittent sensor failures, some measurements are missing. Additionally, for quality checks, the number of bottles pulled for each subgroup (a 15-minute interval) can vary. The quality team needs a robust chart that can handle this imperfect data, using median-based statistics which are less sensitive to outliers.
Create a table 'BottleVolumes' where subgroup sizes vary. Some subgroups will have fewer than the nominal size of 8. One subgroup will be entirely missing. Use a DO loop that sometimes skips records.
Copied!
data mycas.BottleVolumes;\n do Interval = 1 to 30;\n /* Skip an entire subgroup */\n if Interval = 12 then continue;\n /* Create varying sample sizes */\n SampleSize = floor(5 + 4 * ranuni(7));\n do i = 1 to SampleSize;\n /* Skip individual measurements */\n if ranuni(7) < 0.1 then continue;\n Volume = 500 + 1.5 * rannor(987);\n output;\n end;\n end;\nrun;
1
DATA mycas.BottleVolumes;
2
DO Interval = 1 to 30;
3
/* Skip an entire subgroup */
4
IF Interval = 12THEN continue;
5
/* Create varying sample sizes */
6
SampleSize = floor(5 + 4 * ranuni(7));
7
DO i = 1 to SampleSize;
8
/* Skip individual measurements */
9
IF ranuni(7) < 0.1THEN continue;
10
Volume = 500 + 1.5 * rannor(987);
11
OUTPUT;
12
END;
13
END;
14
RUN;
Étapes de réalisation
1
Run the boxChart action with parameters to handle imperfect data. 'controlStat=MEDIAN' and 'medCentral=MEDMED' use median-based estimates. 'sMethod=RMVLUE' estimates stdev from ranges, which is robust. 'allN=True' ensures all subgroups are used regardless of size. 'testNStd=True' adapts special cause tests for varying N.
Verify that the number of subgroups in the output table is 29, confirming that the entirely missing subgroup (Interval=12) was skipped but all others were included.
The action should run without errors, demonstrating its ability to handle varying subgroup sizes and missing data points. The 'VolumeAnalysis' output table should contain results for 29 subgroups. The summary statistics should show that '_SUBSIZE_' varies. The control limits calculated will be based on medians and ranges, providing a robust analysis despite the data quality issues. The total number of rows in the output table should correspond to the 29 processed subgroups.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.