aggregation aggregate

Retail Sales Analysis with Sliding Window and Missing Data

Scénario de test & Cas d'usage

Business Context

A national retail chain wants to analyze daily sales performance using a 3-day sliding window to smooth out daily fluctuations. The analysis must handle days where sales data might be missing for some stores. Additionally, they want to identify which product categories contributed to the sales within each 3-day window to understand sales drivers.
About the Set : aggregation

Data aggregation and summary statistical calculations.

Discover all actions of aggregation
Data Preparation

Creation of a daily sales table with some missing sales values (represented by '.') and a contributing product category variable.

Copied!
1DATA casuser.daily_store_sales;
2LENGTH StoreID $ 8 ProductCategory $ 20;
3FORMAT SaleDate date9.;
4INFILE DATALINES delimiter=',';
5INPUT StoreID $ SaleDate :date9. DailySales ProductCategory $;
6DATALINES;
7STORE_01,01JUN2023,12500,Electronics
8STORE_01,02JUN2023,14000,Apparel
9STORE_02,02JUN2023,9500,Groceries
10STORE_01,03JUN2023,11000,Electronics
11STORE_02,03JUN2023,.,
12STORE_01,04JUN2023,15500,Home Goods
13STORE_02,04JUN2023,11200,Groceries
14STORE_01,05JUN2023,.,
15STORE_02,05JUN2023,10500,Apparel
16STORE_01,06JUN2023,13000,Apparel
17STORE_02,06JUN2023,11800,Electronics
18;
19RUN;

Étapes de réalisation

1
Load the sales data, which includes missing values, into CAS.
Copied!
1 
2PROC CASUTIL;
3load
4DATA=casuser.daily_store_sales outcaslib='casuser' casout='daily_store_sales' replace;
5QUIT;
6 
2
Execute aggregation with a 3-day sliding window, including missing values in calculations, and capturing contributing product categories.
Copied!
1PROC CAS;
2 aggregation.aggregate /
3 TABLE={name='daily_store_sales', groupBy={'StoreID'}},
4 id='SaleDate',
5 interval='DAY',
6 windowInt='3 DAY',
7 includeMissing=false,
8 doESP=true,
9 contribute='ProductCategory',
10 contributeUnroll=true,
11 contributeDelimiter=', ',
12 varSpecs={{name='DailySales', agg='mean n'}},
13 casOut={name='sales_sliding_window', caslib='casuser', replace=true};
14RUN;
15QUIT;
3
Print the results to verify the sliding window calculation and the concatenated list of contributing product categories.
Copied!
1 
2PROC PRINT
3DATA=casuser.sales_sliding_window;
4title '3-Day Sliding Window Sales Analysis';
5RUN;
6 

Expected Result


The output table 'sales_sliding_window' will show the rolling 3-day average sales and transaction count for each store. The 'includeMissing=false' ensures that missing daily sales do not affect the mean calculation. Crucially, a new column (named 'ProductCategory' by default) will be created, containing a comma-separated list of the product categories from the records that fall within each 3-day window, demonstrating the 'contribute' functionality.