simple

mdSummary

Description

The mdSummary action calculates multidimensional summaries of numeric variables. This action is particularly useful for exploring data by generating descriptive statistics for various subgroups, which is a fundamental step in data analysis and reporting.

simple.mdSummary / attributes={{name="string", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, descending=TRUE | FALSE, groupByLimit=64-bit-integer, includeMissing=TRUE | FALSE, inputs={{name="string", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, orderByGbyRaw=TRUE | FALSE, returnValidSet=TRUE | FALSE, sets={{groupBy={"string", ...}, groupByFmts={"string", ...}, where="string"}, ...}, subSet={"string", ...}, table={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{...}}, computedVarsProgram="string", dataSourceOptions={...}, groupBy={{...}}, groupByMode="string", importOptions={...}, name="string", orderBy={{...}}, singlePass=TRUE | FALSE, vars={{...}}, where="string", whereTable={...}}, weight="string"
Settings
ParameterDescription
attributes Specifies the variable attributes.
descending When set to True, the formatted levels of the variables are arranged in descending order.
groupByLimit Specifies the maximum number of levels in a group-by set. When the server determines this number of levels, the server stops and does not return a result. Specify this parameter if you want to avoid creating large result sets in group-by operations.
includeMissing When set to True, missing values are included in the determination of group-by values.
inputs Specifies the input variables for the analysis.
orderByGbyRaw When set to True, the ordering of the group-by variables is based on the raw values of the variables, not the formatted values.
returnValidSet When set to True, the group-by sets whose sizes do not exceed the specified group-by limit are returned.
sets Specifies the set specifications, allowing for different group-by variables and filters in a single call.
subSet Specifies the summary statistics to generate (e.g., MEAN, SUM, N, MIN, MAX).
table Specifies the input CAS table for the analysis.
weight Specifies a numeric variable whose values weight the values of the analysis variables.
Data Preparation View data prep sheet
Data Creation

The following code creates a sample table named 'CARS' in the 'casuser' caslib, which will be used in the examples. This table contains information about different car models, including their type, origin, and engine size.

Copied!
1DATA casuser.cars;
2 LENGTH Type $ 10 Origin $ 10;
3 INFILE DATALINES delimiter=',';
4 INPUT Type $ Origin $ EngineSize Horsepower MPG_City;
5 DATALINES;
6Sedan,Asia,1.6,120,30
7Truck,USA,5.0,300,15
8Sedan,Europe,2.0,150,25
9SUV,USA,3.5,280,18
10Hybrid,Asia,1.8,100,45
11Sports,Europe,3.0,350,20
12Truck,Asia,2.4,180,20
13Sedan,USA,2.5,170,24
14;
15RUN;

Examples

This example calculates the mean, sum, and count (N) for the 'Horsepower' variable, grouped by car 'Type' and 'Origin'.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 SIMPLE.mdSummary /
3 TABLE={name='cars'},
4 inputs={{name='Horsepower'}},
5 subSet={'mean', 'sum', 'N'},
6 sets={{groupBy={'Type', 'Origin'}}};
7RUN;
Result :
The action returns a result table containing the calculated mean, sum, and count of 'Horsepower' for each combination of 'Type' and 'Origin'.

This example demonstrates a more complex use of mdSummary. It processes two different sets of data in a single call. The first set calculates various statistics (Mean, Min, Max, Std) for 'EngineSize' and 'Horsepower' grouped by 'Type'. The second set calculates the same statistics for cars from 'USA' only, grouped by 'Type'. This showcases the flexibility of the 'sets' parameter for conditional analysis.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 SIMPLE.mdSummary /
3 TABLE={name='cars'},
4 subSet={'mean', 'min', 'max', 'std'},
5 inputs={{name='EngineSize'}, {name='Horsepower'}},
6 sets={
7 {groupBy={'Type'}},
8 {groupBy={'Type'}, where='Origin="USA"'}
9 };
10RUN;
Result :
The result consists of two tables. The first table shows the specified statistics for all cars, grouped by 'Type'. The second table provides the same statistics but is filtered to include only cars of 'USA' origin, also grouped by 'Type'.

FAQ

What is the primary function of the mdSummary action?
What is a mandatory requirement for the input data when using the mdSummary action?
What kind of summary statistics can be generated using the 'subSet' parameter?
How can I control the grouping of data in the mdSummary action?
What is the purpose of the 'groupByLimit' parameter?
How does the 'includeMissing' parameter affect the analysis?
Can I weight the analysis variables in the mdSummary action?