mdSummary - WeAreCAS

Q: What is a mandatory requirement for the input data when using the mdSummary action?

The input data must contain at least one numeric variable. If this condition is not met, the action will return an error status and an empty dictionary as the result.

Q: What kind of summary statistics can be generated using the 'subSet' parameter?

The 'subSet' parameter allows you to specify the summary statistics to generate, which include: CSS (Corrected Sum of Squares), CV (Coefficient of Variation), KURTOSIS, MAX (Maximum), MEAN, MIN (Minimum), N (Number of Observations), NMISS (Number of Missing Values), PROBT (p-value for t-statistic), SKEWNESS, STD (Standard Deviation), STDERR (Standard Error), SUM, T (t-statistic), USS (Uncorrected Sum of Squares), and VAR (Sample Variance).

Q: How can I control the grouping of data in the mdSummary action?

You can use the 'sets' parameter, which includes a 'groupBy' subparameter. This allows you to specify a list of variables to group the data by for the analysis. You can also use 'groupByFmts' to specify formats for these group-by variables.

Q: What is the purpose of the 'groupByLimit' parameter?

The 'groupByLimit' parameter specifies the maximum number of levels in a group-by set. When the server reaches this number, it stops processing to avoid creating excessively large result sets. This helps manage performance and resource usage.

Q: How does the 'includeMissing' parameter affect the analysis?

When 'includeMissing' is set to True, which is the default, missing values are included in the determination of group-by values.

Q: Can I weight the analysis variables in the mdSummary action?

Yes, you can use the 'weight' parameter to specify a numeric variable whose values will be used to weight the values of the analysis variables.

Description

The mdSummary action calculates multidimensional summaries of numeric variables. This action is particularly useful for exploring data by generating descriptive statistics for various subgroups, which is a fundamental step in data analysis and reporting.

simple.mdSummary / attributes={{name="string", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, descending=TRUE | FALSE, groupByLimit=64-bit-integer, includeMissing=TRUE | FALSE, inputs={{name="string", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, orderByGbyRaw=TRUE | FALSE, returnValidSet=TRUE | FALSE, sets={{groupBy={"string", ...}, groupByFmts={"string", ...}, where="string"}, ...}, subSet={"string", ...}, table={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{...}}, computedVarsProgram="string", dataSourceOptions={...}, groupBy={{...}}, groupByMode="string", importOptions={...}, name="string", orderBy={{...}}, singlePass=TRUE | FALSE, vars={{...}}, where="string", whereTable={...}}, weight="string"

Settings

Parameter	Description
attributes	Specifies the variable attributes.
descending	When set to True, the formatted levels of the variables are arranged in descending order.
groupByLimit	Specifies the maximum number of levels in a group-by set. When the server determines this number of levels, the server stops and does not return a result. Specify this parameter if you want to avoid creating large result sets in group-by operations.
includeMissing	When set to True, missing values are included in the determination of group-by values.
inputs	Specifies the input variables for the analysis.
orderByGbyRaw	When set to True, the ordering of the group-by variables is based on the raw values of the variables, not the formatted values.
returnValidSet	When set to True, the group-by sets whose sizes do not exceed the specified group-by limit are returned.
sets	Specifies the set specifications, allowing for different group-by variables and filters in a single call.
subSet	Specifies the summary statistics to generate (e.g., MEAN, SUM, N, MIN, MAX).
table	Specifies the input CAS table for the analysis.
weight	Specifies a numeric variable whose values weight the values of the analysis variables.

Data Preparation View data prep sheet

Data Creation

The following code creates a sample table named 'CARS' in the 'casuser' caslib, which will be used in the examples. This table contains information about different car models, including their type, origin, and engine size.

Copied!

1	DATA casuser.cars;
2	LENGTH Type $ 10 Origin $ 10;
3	INFILE DATALINES delimiter=',';
4	INPUT Type $ Origin $ EngineSize Horsepower MPG_City;
5	DATALINES;
6	Sedan,Asia,1.6,120,30
7	Truck,USA,5.0,300,15
8	Sedan,Europe,2.0,150,25
9	SUV,USA,3.5,280,18
10	Hybrid,Asia,1.8,100,45
11	Sports,Europe,3.0,350,20
12	Truck,Asia,2.4,180,20
13	Sedan,USA,2.5,170,24
14	;
15	RUN;

Examples

This example calculates the mean, sum, and count (N) for the 'Horsepower' variable, grouped by car 'Type' and 'Origin'.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	SIMPLE.mdSummary /
3	TABLE={name='cars'},
4	inputs={{name='Horsepower'}},
5	subSet={'mean', 'sum', 'N'},
6	sets={{groupBy={'Type', 'Origin'}}};
7	RUN;

Result :
The action returns a result table containing the calculated mean, sum, and count of 'Horsepower' for each combination of 'Type' and 'Origin'.

This example demonstrates a more complex use of mdSummary. It processes two different sets of data in a single call. The first set calculates various statistics (Mean, Min, Max, Std) for 'EngineSize' and 'Horsepower' grouped by 'Type'. The second set calculates the same statistics for cars from 'USA' only, grouped by 'Type'. This showcases the flexibility of the 'sets' parameter for conditional analysis.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	SIMPLE.mdSummary /
3	TABLE={name='cars'},
4	subSet={'mean', 'min', 'max', 'std'},
5	inputs={{name='EngineSize'}, {name='Horsepower'}},
6	sets={
7	{groupBy={'Type'}},
8	{groupBy={'Type'}, where='Origin="USA"'}
9	};
10	RUN;

Result :
The result consists of two tables. The first table shows the specified statistics for all cars, grouped by 'Type'. The second table provides the same statistics but is filtered to include only cars of 'USA' origin, also grouped by 'Type'.

FAQ

What is the primary function of the mdSummary action?

What is a mandatory requirement for the input data when using the mdSummary action?

What kind of summary statistics can be generated using the 'subSet' parameter?

How can I control the grouping of data in the mdSummary action?

What is the purpose of the 'groupByLimit' parameter?

How does the 'includeMissing' parameter affect the analysis?

Can I weight the analysis variables in the mdSummary action?

Actions associées

simple

compare

Compares two tables by computing the index and frequency of each group, cumul...

simple

correlation

The correlation action computes Pearson product-moment correlations. This is ...

simple

crossTab

The crossTab action performs one-way or two-way tabulations, also known as fr...

simple

freq

Generates a frequency distribution for one or more variables. It calculates c...

simple

groupByInfo

The groupByInfo action computes the index and frequency of each group, and th...

simple

groupBy

Builds BY groups in terms of the variable value combinations given the variab...

Table of Contents

Description

Data Creation

Examples

Basic Multidimensional Summary

Detailed mdSummary with Multiple Sets and Subsets

FAQ

Actions associées

compare

correlation

crossTab

freq

groupByInfo

groupBy