image augmentImages

Performance/Volume Case: High-Volume Patch Generation from Large Satellite Images

Scénario de test & Cas d'usage

Business Context

An environmental agency is analyzing large, high-resolution satellite images (e.g., 8192x8192 pixels) to train a model for land use classification (forest, urban, water). They need to efficiently extract thousands of smaller, uniform 256x256 pixel patches from each large image to use as training samples. The process must be fast and scalable.
About the Set : image

Image processing, manipulation, and analysis.

Discover all actions of image
Data Preparation

Create a table simulating two very large satellite images. The key is to have metadata like dimension and resolution that the `sweepImage` function would use. The image data itself is a placeholder.

Copied!
1DATA casuser.satellite_imagery;
2 LENGTH _image_ $200. _filename_ $50.;
3 _dimension_=8192; _resolution_=8192; _imageformat_='JPG'; _filename_='REGION_A_8K.jpg'; _image_='...large_binary_data_A...'; OUTPUT;
4 _dimension_=8192; _resolution_=8192; _imageformat_='JPG'; _filename_='REGION_B_8K.jpg'; _image_='...large_binary_data_B...'; OUTPUT;
5RUN;
6 
7PROC CASUTIL;
8 load DATA=casuser.satellite_imagery casout='satellite_imagery' replace;
9QUIT;

Étapes de réalisation

1
Execute patch generation using a non-overlapping sliding window (`sweepImage`). The `stepSize` is equal to the patch `width` to tile the image perfectly. This tests the core efficiency of the sweep functionality for massive data generation.
Copied!
1PROC CAS;
2 image.augmentImages /
3 TABLE='satellite_imagery',
4 decode=TRUE,
5 copyVars={'_filename_'},
6 augmentations={{
7 sweepImage=TRUE,
8 width=256,
9 height=256,
10 stepSize=256,
11 verticalStepSize=256
12 }},
13 casOut={name='satellite_patches_nonoverlap', caslib='casuser', replace=TRUE};
14QUIT;
2
Execute a second patch generation, this time with overlapping patches by setting a `stepSize` of 128 pixels. This is common for creating more robust models. The `writeRandomly` option is also tested to ensure data is shuffled, which is beneficial for subsequent training steps.
Copied!
1PROC CAS;
2 image.augmentImages /
3 TABLE='satellite_imagery',
4 decode=TRUE,
5 copyVars={'_filename_'},
6 writeRandomly=TRUE,
7 seed=1337,
8 augmentations={{
9 sweepImage=TRUE,
10 width=256,
11 height=256,
12 stepSize=128,
13 verticalStepSize=128
14 }},
15 casOut={name='satellite_patches_overlap', caslib='casuser', replace=TRUE};
16QUIT;

Expected Result


The first step should generate exactly 1024 patches (32x32 grid) for each of the two source images, resulting in 2048 rows in the 'satellite_patches_nonoverlap' table. The second step should generate a much larger number of patches (63x63 grid, approx. 3969 per image) due to the overlap, resulting in nearly 8000 rows in the 'satellite_patches_overlap' table. The action should complete both steps efficiently without memory issues, demonstrating its scalability.