Data Step

How to Aggregate and Sum Data by Group

Simon 14 vues

In daily data processing, it is very common to need to switch from a "transactional" level of detail (multiple rows per customer) to a "summary" level (one row per customer with a total).

1. The Problem: Summing balances by customer

The user has a dataset containing customer numbers, dates, and amounts (balances). The same customer (Cust_1) appears multiple times on different dates.


CustomerAmountDate
Cust_1100.0001/01/2008
Cust_1150.0010/01/2008
Cust_1125.0011/01/2008

Objective: Create a new table containing the sum of the amounts for each customer.

  • Expected result for Cust_1: 375.00

Note :
In the original discussion, an effective solution using PROC REPORT was provided. Although often used for display (printing), this procedure is very powerful for generating output tables.
1PROC REPORT DATA=cust nowd out=temp;
2 column cust_no bal;
3 define cust_no / group noprint;
4 define bal / analysis sum noprint;
5RUN;
Code analysis:
out=temp: This is the key option. It tells SAS© not to just display the report, but to save the result in a table named temp.

column cust_no bal: Selects the columns to use.

define cust_no / group: Indicates that the cust_no variable is the grouping key (like a "GROUP BY" in SQL).

define bal / analysis sum: Indicates that the bal variable should be analyzed and summed.
Note :
PROC SQL (For those accustomed to SQL)
This is often the most intuitive method if you come from the world of relational databases.
1PROC SQL;
2 create TABLE temp_sql as
3 select cust_no, sum(bal) as Total_Balance
4 from cust
5 group BY cust_no;
6QUIT;
Note: The nway statement is important to keep only the highest level of summary (by customer) and avoid overall totals.

To summarize data in SAS©:

  1. Use PROC REPORT if you need flexibility and are already building a report.

  2. Use PROC SQL if you prefer standard SQL syntax.

  3. Use PROC MEANS if performance on very large volumes is your priority.

Expert's advice: As Cynthia points out, don't hesitate to consult the SAS© documentation for the OUTPUT statement (for PROC MEANS) or the OUT= option (for TABULATE and REPORT). This is often where the key to transforming a visual report into a data table lies.