The aggregate action performs aggregation on selected variables. It provides a wide range of summary statistics, including sum, mean, standard deviation, and more. This action is highly flexible, allowing for grouping by one or more variables, handling of time-series data with specified intervals, and creating customized output tables. It is a powerful tool for data summarization and preparation for further analysis.
| Parameter | Description |
|---|---|
| align | Specifies the alignment of the representative value with respect to an interval or bin. |
| bin | Specifies the minimum and maximum values of a bin. |
| casOut | Specifies the settings for an output table. |
| contribute | When the doESP parameter is True, you can specify a variable whose values within each aggregation interval are recorded. |
| contributeColumnLabel | Specifies a value to override the variable label from the contribute variable. |
| contributeColumnName | Specifies a value to override the variable name from the contribute variable. |
| contributeDelimiter | Specifies a delimiter that is used between concatenated values of the contribute variable. |
| contributeTrim | When set to True, leading and trailing blanks are removed from the formatted value of the contribute variable. |
| contributeUnroll | When set to True, each raw value from the contribute variable adds a row to the result table. |
| copyVars | Specifies the variables to copy from the input table to the output table. |
| doESP | When set to True, the action can take advantage of partitioning and ordering on the input table. |
| edgeId | Specifies a numeric variable whose values are used to order the values of each varSpecs specification that uses the FIRST, LAST, FNE, or LNE aggregator. |
| exclnpwgt | When set to True and a weight variable is specified, then observations with a non-positive weight value are excluded from the analysis. |
| excludeSelf | When set to True and the doESP parameter is True, the aggregation excludes the current observation's contribution. |
| freq | Specifies a numeric variable whose values are used as the frequency of analysis variable values. |
| freqStrict | This parameter is related to using the MODE aggregator. By default, observations with frequency values that are missing or less than 1 are excluded. |
| groupByLimit | Specifies the maximum number of levels in a group-by set. |
| groupedIntervalOutput | When set to True, save only one of the same aggregated intervals with respect to the last Id value. |
| id | Specifies a numeric variable that identifies the timestamp that is associated with each observation in the input table. |
| idEnd | Specifies the inclusive maximum value of the ID variable to be considered in the analysis. |
| idOutputName | Specifies the new name of the ID variable in the output table. |
| idRange | Specifies the inclusive minimum and maximum values of the ID variable to be considered in the analysis. |
| idStart | Specifies the inclusive minimum value of the ID variable to be considered in the analysis. |
| includeEmptyInterval | By default, intervals with a missing value for the ID variable are included in the output. When set to False, these intervals are excluded. |
| includeMissing | By default, missing values are included in the analysis. When set to False, observations with missing values are excluded. |
| inputs | Specifies the input variables to use in the analysis. For raw numeric variables, the default aggregator is SUMMARY. For all other situations, the default aggregator is N. |
| interval | Specifies the time period for the accumulation of observations. |
| jumpingWindow | When set to True, specifies that aggregation occurs over a time window that can contain multiple intervals and aggregation is reset when the specified time range elapses. |
| keepRecord | When set to True, each observation's original value for the ID variable is kept without performing interval alignment. |
| keepRecordId | When set to True and the doESP parameter is True, each observation's original ID value is kept without performing interval alignment. |
| modeSingle | This parameter is related to using the MODE aggregator. By default, the most frequent value is 'missing' if all the distinct values have a frequency of 1. When set to True, the minimum of these distinct values is returned. |
| offset | Specifies the offset of each interval. |
| partKey | When the table is partitioned, you can specify a partition key so that the results are computed for the partition only. |
| pctlDef | Specifies how to compute quantile statistics (percentiles). |
| pti | Specifies the time value when the aggregation within an interval or a bin is terminated. |
| ptw | Specifies the subinterval with respect to each window interval. |
| raw | When set to True, raw values of the variables in the input parameter are used. |
| saveGroupbyFormat | By default, the formatted values of the groupBy variables from the input table are copied to the results. When set to False, the formatted values are not copied. |
| saveGroupbyRaw | By default, the raw values of the groupBy variables from the input table are copied to the results. When set to False, the raw values are not copied. |
| saveVariableColumn | By default, the variable name for each analysis variable is included in the results. When set to False, this column is not included. |
| saveVariableSpecification | By default, the results include a column that is named 'Variable Specification' to identify the varSpecs specification that produced the result. When set to False, this column is not included. |
| subBinOffset | Specifies an offset from the beginning of a bin. |
| subBinWidth | Specifies the width of the sub bin within a bin. |
| subInterval | Specifies a smaller interval to control the time period alignment within each interval for the aggregation of observations. |
| table | Specifies the input CAS table to be used by the action. |
| varSpecs | Specifies the variable to aggregate and the settings for the aggregator. |
| weight | Specifies a numeric variable whose values are used as the weight of numeric analysis variable values when the aggregator is SUMMARY. |
| windowBin | Specifies the minimum and maximum values of a window bin. |
| windowInt | Specifies the time window for the accumulation of observations with respect to each time interval. |
| windowOffset | Specifies the offset of each window interval. |
| windowSubBinOffset | Specifies the starting point within a window bin in which record values are aggregated. |
| windowSubBinWidth | Specifies the width of the sub bin within each windowBin. |
| windowSubInt | Specifies a smaller interval to control the sub time period alignment within each window interval for the aggregation of observations. |
The following code creates a sample CAS table named 'cars' which contains information about various car models, including their make, model, type, origin, horsepower, and MSRP. This table will be used in the subsequent examples to demonstrate the capabilities of the 'aggregate' action.
| 1 | DATA casuser.cars; |
| 2 | LENGTH Make $ 13 Model $ 35 Type $ 7 Origin $ 6 DriveTrain $ 5; |
| 3 | INFILE DATALINES delimiter=','; |
| 4 | INPUT Make $ Model $ Type $ Origin $ DriveTrain $ MSRP Horsepower; |
| 5 | DATALINES; |
| 6 | Acura,MDX,SUV,Asia,All,36945,265 |
| 7 | Acura,RSX Type S 2dr,Sedan,Asia,Front,23820,200 |
| 8 | Acura,TSX 4dr,Sedan,Asia,Front,26990,200 |
| 9 | Acura,TL 4dr,Sedan,Asia,Front,33195,270 |
| 10 | Acura,3.5 RL 4dr,Sedan,Asia,Front,43755,225 |
| 11 | Acura,3.5 RL w/Navigation 4dr,Sedan,Asia,Front,46100,225 |
| 12 | Audi,A4 1.8T 4dr,Sedan,Europe,Front,25940,170 |
| 13 | Audi,A41.8T convertible 2dr,Sedan,Europe,Front,35940,170 |
| 14 | Audi,A4 3.0 4dr,Sedan,Europe,Front,31840,220 |
| 15 | Audi,A4 3.0 Quattro 4dr,Sedan,Europe,All,33430,220 |
| 16 | ; |
| 17 | RUN; |
This example demonstrates a basic aggregation on the 'cars' table. It calculates summary statistics (minimum, maximum, and mean) for the 'Horsepower' and 'MSRP' variables across all records in the table. The results are stored in a new CAS table named 'cars_aggregated'.
| 1 | PROC CAS; |
| 2 | aggregation.aggregate / |
| 3 | TABLE={name='cars'}, |
| 4 | varSpecs={{name='Horsepower', agg='min max mean'}, |
| 5 | {name='MSRP', agg='min max mean'}}, |
| 6 | casOut={name='cars_aggregated', replace=true}; |
| 7 | RUN; |
This example performs a more detailed aggregation by grouping the data based on the 'Origin' of the cars. It calculates the summary statistics for 'Horsepower' and 'MSRP' for each origin group (Asia and Europe). This allows for a comparative analysis of vehicle characteristics based on their manufacturing region.
| 1 | PROC CAS; |
| 2 | aggregation.aggregate / |
| 3 | TABLE={name='cars', groupBy={'Origin'}}, |
| 4 | varSpecs={{name='Horsepower', agg='min max mean'}, |
| 5 | {name='MSRP', agg='min max mean'}}, |
| 6 | casOut={name='cars_aggregated_by_origin', replace=true}; |
| 7 | RUN; |
A retail bank needs to generate a monthly summary of customer transactions. The goal is to calculate the total transaction amount, the average transaction value, and the number ...
A manufacturing plant monitors thousands of machines using IoT sensors that stream data every second. To prevent failures, they need to analyze this high-volume data in near rea...
A national retail chain wants to analyze daily sales performance using a 3-day sliding window to smooth out daily fluctuations. The analysis must handle days where sales data mi...