Data Step

How to Aggregate and Sum Data by Group

Simon 35 views
Difficulty Level
Débutant
Published on :
Simon

Expert Advice

Simon

While PROC REPORT and PROC SQL offer great flexibility, the most efficient native SAS method for aggregating massive datasets remains PROC SUMMARY (or PROC MEANS). It is worth clarifying that the NWAY option actually belongs to these procedures—not SQL—and is a crucial best practice when using a CLASS statement to optimize processing by suppressing unnecessary global subtotals in your output table.

In daily data processing, it is very common to need to switch from a "transactional" level of detail (multiple rows per customer) to a "summary" level (one row per customer with a total).

1. The Problem: Summing balances by customer

The user has a dataset containing customer numbers, dates, and amounts (balances). The same customer (Cust_1) appears multiple times on different dates.


CustomerAmountDate
Cust_1100.0001/01/2008
Cust_1150.0010/01/2008
Cust_1125.0011/01/2008

Objective: Create a new table containing the sum of the amounts for each customer.

  • Expected result for Cust_1: 375.00

Note :
In the original discussion, an effective solution using PROC REPORT was provided. Although often used for display (printing), this procedure is very powerful for generating output tables.
1PROC REPORT DATA=cust nowd out=temp;
2 column cust_no bal;
3 define cust_no / group noprint;
4 define bal / analysis sum noprint;
5RUN;
Code analysis:
out=temp: This is the key option. It tells SAS© not to just display the report, but to save the result in a table named temp.

column cust_no bal: Selects the columns to use.

define cust_no / group: Indicates that the cust_no variable is the grouping key (like a "GROUP BY" in SQL).

define bal / analysis sum: Indicates that the bal variable should be analyzed and summed.
Note :
PROC SQL (For those accustomed to SQL)
This is often the most intuitive method if you come from the world of relational databases.
1PROC SQL;
2 create TABLE temp_sql as
3 select cust_no, sum(bal) as Total_Balance
4 from cust
5 group BY cust_no;
6QUIT;
Note: The nway statement is important to keep only the highest level of summary (by customer) and avoid overall totals.

To summarize data in SAS©:

  1. Use PROC REPORT if you need flexibility and are already building a report.

  2. Use PROC SQL if you prefer standard SQL syntax.

  3. Use PROC MEANS if performance on very large volumes is your priority.

Expert's advice: As Cynthia points out, don't hesitate to consult the SAS© documentation for the OUTPUT statement (for PROC MEANS) or the OUT= option (for TABULATE and REPORT). This is often where the key to transforming a visual report into a data table lies.