SAS9

Mastering Descriptive Statistics in SAS: PROC MEANS, SUMMARY, and TABULATE

Simon 14 views

In the world of data analysis with SAS©, producing descriptive statistics is a fundamental step. Three procedures particularly stand out for these tasks: PROC MEANS, PROC SUMMARY, and PROC TABULATE. Although they share a common foundation, each has its own specifics and precise use cases.

This article explores their differences, their respective operations, and how to use them effectively, whether through code or graphical interfaces like SAS© Enterprise Guide and SAS© Studio.

Mastering Descriptive Statistics in SAS: PROC MEANS, SUMMARY, and TABULATE -

PROC MEANS: Direct Exploration

The PROC MEANS procedure is often the analyst's first instinct. Its main function is to calculate descriptive statistics for variables, either for all observations or by groups.

Its capabilities include:

  • Estimating quantiles (including the median).

  • Calculating confidence limits for the mean.

  • Identifying extreme values.

  • Performing Student's t-tests (t-test).

Main feature: By default, PROC MEANS displays its results directly in the Output window.

PROC SUMMARY: The Discreet Power

Technically, the PROC SUMMARY is identical to PROC MEANS in terms of statistical calculations. It offers the same processing options.

The key difference: Unlike MEANS, which favors display, SUMMARY is designed to write its results to an output table (dataset). It displays nothing by default in the results window, making it ideal for preparing intermediate data without cluttering reports.

PROC TABULATE: Advanced Formatting

The PROC TABULATE procedure builds on the concepts of MEANS and SUMMARY but goes much further in terms of presentation. It specializes in displaying descriptive statistics in hierarchical tables.

Its major strengths:

  • Flexibility: It allows for classifying variable values and establishing complex hierarchical relationships between them.

  • Dual Output: It can send results to the output window and/or to a data table.

  • Formatting: It offers full control over the labels and formatting of the generated statistics.

Note on SAS© Viya: In the SAS© Viya environment, these three procedures use CAS actions (Cloud Analytic Services) when processing CAS tables, thus ensuring optimal performance on large volumes of data.

Practical Examples (Code)

Here are concrete examples using the sashelp.cars dataset, included in all SAS© installations.

Using MEANS and SUMMARY

The code below illustrates the difference in output. PROC MEANS displays the result, while PROC SUMMARY creates a table named WORK.summaryout.

1
 
1/* Affichage direct des statistiques */
2PROC MEANS DATA=sashelp.cars;
3 var msrp invoice;
4RUN;
5 
6/* Création d'une table de sortie */
7PROC SUMMARY DATA=sashelp.cars;
8 var msrp invoice;
9 OUTPUT out=WORK.summaryout (LABEL="Table de sortie PROC SUMMARY pour SASHELP.CARS");
10RUN;

Using TABULATE for a structured table

Here, we create a table crossing the vehicle type (row) with weight and wheelbase statistics (column).

1PROC TABULATE
2 DATA=SASHELP.CARS
3 OUT=WORK.tabulateout (LABEL="Table de sortie PROC TABULATE pour SASHELP.CARS")
4 FORMAT=comma10.2;
5 VAR Weight Wheelbase;
6 CLASS Type / ORDER=freq MISSING;
7 
8 TABLE /* Dimension Ligne */
9 Type,
10 /* Dimension Colonne */
11 N
12 Weight * Range
13 Wheelbase * Mean;
14RUN; QUIT;

Going Further: Conditional Formatting with TABULATE

One of the great advantages of PROC TABULATE is its ability to use custom formats to highlight data (e.g., "traffic light" color coding).

1/* Définition du format de couleur */
2PROC FORMAT;
3 VALUE watchit
4 0 - 20000 = 'Green'
5 20001 - 30000 = 'Orange'
6 30001 - 50000 = 'Blue'
7 50001 - 60000 = 'Purple'
8 60001 - high = 'Red';
9RUN;
10 
11/* Application du format comme couleur de fond (foreground) */
12PROC TABULATE DATA=sashelp.cars S=[foreground=watchit.]
13 FORMAT=dollar10.2
14 OUT=WORK.tabulatecolor (LABEL="Table de sortie colorée pour SASHELP.CARS");
15 CLASS type cylinders / MISSING;
16 VAR invoice;
17 
18 TABLE type ALL,
19 Invoice * mean;
20RUN;

Comparative Summary

To help you choose the right procedure for your needs, here is a summary table of the main functional differences and their availability in graphical interfaces:

CharacteristicPROC MEANSPROC SUMMARYPROC TABULATE
Main ObjectiveQuick exploration and standard statisticsCalculation of statistics for storage (ETL)Presentation reports and complex tables
Default OutputResults window (Output)SAS© Data Table (Dataset)Results window (Output)
Layout FlexibilityStandard (Vertical list)N/A (Database structure)High (Cross-tab and hierarchical tables)
SAS© Enterprise Guide SupportYes (Via Wizard)No (Code required)Yes (Via Wizard)
SAS© Studio SupportYes (Via Tasks)No (Code required)No (Code required)

Although PROC MEANS, SUMMARY, and TABULATE are similar in the descriptive statistics they generate, they differ in the nature of their outputs and their flexibility.

While PROC MEANS is ideal for a quick check and PROC SUMMARY for creating data tables, PROC TABULATE offers the greatest control. Its versatility often makes it the best choice for scenarios requiring a polished and hierarchical presentation of data.