The mdSummary action calculates multidimensional summaries of numeric variables. This action is particularly useful for exploring data by generating descriptive statistics for various subgroups, which is a fundamental step in data analysis and reporting.
| Parameter | Description |
|---|---|
| attributes | Specifies the variable attributes. |
| descending | When set to True, the formatted levels of the variables are arranged in descending order. |
| groupByLimit | Specifies the maximum number of levels in a group-by set. When the server determines this number of levels, the server stops and does not return a result. Specify this parameter if you want to avoid creating large result sets in group-by operations. |
| includeMissing | When set to True, missing values are included in the determination of group-by values. |
| inputs | Specifies the input variables for the analysis. |
| orderByGbyRaw | When set to True, the ordering of the group-by variables is based on the raw values of the variables, not the formatted values. |
| returnValidSet | When set to True, the group-by sets whose sizes do not exceed the specified group-by limit are returned. |
| sets | Specifies the set specifications, allowing for different group-by variables and filters in a single call. |
| subSet | Specifies the summary statistics to generate (e.g., MEAN, SUM, N, MIN, MAX). |
| table | Specifies the input CAS table for the analysis. |
| weight | Specifies a numeric variable whose values weight the values of the analysis variables. |
The following code creates a sample table named 'CARS' in the 'casuser' caslib, which will be used in the examples. This table contains information about different car models, including their type, origin, and engine size.
| 1 | DATA casuser.cars; |
| 2 | LENGTH Type $ 10 Origin $ 10; |
| 3 | INFILE DATALINES delimiter=','; |
| 4 | INPUT Type $ Origin $ EngineSize Horsepower MPG_City; |
| 5 | DATALINES; |
| 6 | Sedan,Asia,1.6,120,30 |
| 7 | Truck,USA,5.0,300,15 |
| 8 | Sedan,Europe,2.0,150,25 |
| 9 | SUV,USA,3.5,280,18 |
| 10 | Hybrid,Asia,1.8,100,45 |
| 11 | Sports,Europe,3.0,350,20 |
| 12 | Truck,Asia,2.4,180,20 |
| 13 | Sedan,USA,2.5,170,24 |
| 14 | ; |
| 15 | RUN; |
This example calculates the mean, sum, and count (N) for the 'Horsepower' variable, grouped by car 'Type' and 'Origin'.
| 1 | PROC CAS; |
| 2 | SIMPLE.mdSummary / |
| 3 | TABLE={name='cars'}, |
| 4 | inputs={{name='Horsepower'}}, |
| 5 | subSet={'mean', 'sum', 'N'}, |
| 6 | sets={{groupBy={'Type', 'Origin'}}}; |
| 7 | RUN; |
This example demonstrates a more complex use of mdSummary. It processes two different sets of data in a single call. The first set calculates various statistics (Mean, Min, Max, Std) for 'EngineSize' and 'Horsepower' grouped by 'Type'. The second set calculates the same statistics for cars from 'USA' only, grouped by 'Type'. This showcases the flexibility of the 'sets' parameter for conditional analysis.
| 1 | PROC CAS; |
| 2 | SIMPLE.mdSummary / |
| 3 | TABLE={name='cars'}, |
| 4 | subSet={'mean', 'min', 'max', 'std'}, |
| 5 | inputs={{name='EngineSize'}, {name='Horsepower'}}, |
| 6 | sets={ |
| 7 | {groupBy={'Type'}}, |
| 8 | {groupBy={'Type'}, where='Origin="USA"'} |
| 9 | }; |
| 10 | RUN; |