simple

mdSummary

Description

The mdSummary action calculates multidimensional summaries of numeric variables. This action is particularly useful for exploring data by generating descriptive statistics for various subgroups, which is a fundamental step in data analysis and reporting.

simple.mdSummary / attributes={{name="string", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, descending=TRUE | FALSE, groupByLimit=64-bit-integer, includeMissing=TRUE | FALSE, inputs={{name="string", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, orderByGbyRaw=TRUE | FALSE, returnValidSet=TRUE | FALSE, sets={{groupBy={"string", ...}, groupByFmts={"string", ...}, where="string"}, ...}, subSet={"string", ...}, table={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{...}}, computedVarsProgram="string", dataSourceOptions={...}, groupBy={{...}}, groupByMode="string", importOptions={...}, name="string", orderBy={{...}}, singlePass=TRUE | FALSE, vars={{...}}, where="string", whereTable={...}}, weight="string"
Settings
ParameterDescription
attributesSpecifies the variable attributes.
descendingWhen set to True, the formatted levels of the variables are arranged in descending order.
groupByLimitSpecifies the maximum number of levels in a group-by set. When the server determines this number of levels, the server stops and does not return a result. Specify this parameter if you want to avoid creating large result sets in group-by operations.
includeMissingWhen set to True, missing values are included in the determination of group-by values.
inputsSpecifies the input variables for the analysis.
orderByGbyRawWhen set to True, the ordering of the group-by variables is based on the raw values of the variables, not the formatted values.
returnValidSetWhen set to True, the group-by sets whose sizes do not exceed the specified group-by limit are returned.
setsSpecifies the set specifications, allowing for different group-by variables and filters in a single call.
subSetSpecifies the summary statistics to generate (e.g., MEAN, SUM, N, MIN, MAX).
tableSpecifies the input CAS table for the analysis.
weightSpecifies a numeric variable whose values weight the values of the analysis variables.
Data Preparation View data prep sheet
Data Creation

The following code creates a sample table named 'CARS' in the 'casuser' caslib, which will be used in the examples. This table contains information about different car models, including their type, origin, and engine size.

Copied!
1DATA casuser.cars;
2 LENGTH Type $ 10 Origin $ 10;
3 INFILE DATALINES delimiter=',';
4 INPUT Type $ Origin $ EngineSize Horsepower MPG_City;
5 DATALINES;
6Sedan,Asia,1.6,120,30
7Truck,USA,5.0,300,15
8Sedan,Europe,2.0,150,25
9SUV,USA,3.5,280,18
10Hybrid,Asia,1.8,100,45
11Sports,Europe,3.0,350,20
12Truck,Asia,2.4,180,20
13Sedan,USA,2.5,170,24
14;
15RUN;

Examples

This example calculates the mean, sum, and count (N) for the 'Horsepower' variable, grouped by car 'Type' and 'Origin'.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 SIMPLE.mdSummary /
3 TABLE={name='cars'},
4 inputs={{name='Horsepower'}},
5 subSet={'mean', 'sum', 'N'},
6 sets={{groupBy={'Type', 'Origin'}}};
7RUN;
Result :
The action returns a result table containing the calculated mean, sum, and count of 'Horsepower' for each combination of 'Type' and 'Origin'.

This example demonstrates a more complex use of mdSummary. It processes two different sets of data in a single call. The first set calculates various statistics (Mean, Min, Max, Std) for 'EngineSize' and 'Horsepower' grouped by 'Type'. The second set calculates the same statistics for cars from 'USA' only, grouped by 'Type'. This showcases the flexibility of the 'sets' parameter for conditional analysis.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 SIMPLE.mdSummary /
3 TABLE={name='cars'},
4 subSet={'mean', 'min', 'max', 'std'},
5 inputs={{name='EngineSize'}, {name='Horsepower'}},
6 sets={
7 {groupBy={'Type'}},
8 {groupBy={'Type'}, where='Origin="USA"'}
9 };
10RUN;
Result :
The result consists of two tables. The first table shows the specified statistics for all cars, grouped by 'Type'. The second table provides the same statistics but is filtered to include only cars of 'USA' origin, also grouped by 'Type'.

FAQ

What is the primary function of the mdSummary action?
What is a mandatory requirement for the input data when using the mdSummary action?
What kind of summary statistics can be generated using the 'subSet' parameter?
How can I control the grouping of data in the mdSummary action?
What is the purpose of the 'groupByLimit' parameter?
How does the 'includeMissing' parameter affect the analysis?
Can I weight the analysis variables in the mdSummary action?