SAS9

Optimizing PROC SUMMARY Performance in SAS Enterprise Guide

Simon 13 views

It often happens that a SAS© program runs quickly directly on a Unix server, but seems to freeze or take an infinite amount of time when launched from the SAS© Enterprise Guide (EG) interface. This phenomenon is often seen when using aggregation procedures like PROC SUMMARY on large volumes of data.

If you find that your sorting step (PROC SORT) executes normally, but the statistical summary "grinds" indefinitely, the problem is not necessarily computational power, but output management.

Here are several strategies to diagnose slowdowns and drastically optimize your code.

Optimizing PROC SUMMARY Performance in SAS Enterprise Guide -

Diagnosing with FULLSTIMER

Before optimizing, it is useful to understand where resources are being consumed. Adding the FULLSTIMER option at the beginning of your program allows for detailed statistics on CPU time, real time, and memory usage to be displayed in the log.

1options fullstimer;

The Pitfall of the PRINT Option and ODS

The most common cause of slowdowns in SAS© EG is the attempt to display results.

When you use the PRINT option in a PROC SUMMARY, SAS© tries to generate a report (often in HTML by default in EG). If your classification variables (CLASS) have high cardinality (many unique combinations), the resulting table is gigantic. SAS© EG then consumes an enormous amount of memory and time to transfer and render this display, giving the impression that the calculation is stuck.

The solution:

  • Remove the PRINT option.

  • Close the output destinations (ODS) before the procedure to prevent any automatic generation of graphical or HTML reports.

1ods _all_ close;
2/* Empêche la génération de sorties visuelles lourdes */

Preferring NWAY over _TYPE_ Filtering

A common practice is to generate all statistics and then filter the output table to keep only the finest level of aggregation (where all class variables are taken into account) via a WHERE _TYPE_ = ... clause.

However, it is much more efficient to use the NWAY option. This option instructs SAS© to calculate and write statistics only for the combination of all class variables, thus avoiding the unnecessary calculation of intermediate subtotals.

Before (less efficient):

1PROC SUMMARY DATA=MaTable;
2 class var1 var2;
3 OUTPUT out=Resultat (where=(_type_=3)); /* Nécessite de connaître la valeur du type */
4RUN;

After (Recommended):

1PROC SUMMARY DATA=MaTable nway;
2 class var1 var2;
3 OUTPUT out=Resultat;
4RUN;

Combining PROC SORT and BY Processing

If your table is very large, the CLASS statement can be memory-intensive because it has to keep the combinations in memory.

An optimization technique is to first sort your data, then use a BY statement instead of (or in addition to) the CLASS statement in the PROC SUMMARY. Group processing (BY) is sequential and consumes less RAM.

Since you often perform a PROC SORT beforehand, you just need to add one of the grouping variables to this sort.

Optimized Code Example

Here is what the final code looks like, incorporating all these improvements:

1options fullstimer;
2 
3/* 1. Tri des données en incluant la variable de regroupement principale */
4PROC SORT DATA=&FName.;
5 BY Policy_number INCEPTION_DATE_ADJ broker_number;
6RUN;
7 
8/* 2. Fermeture des sorties ODS pour éviter les lenteurs d'affichage dans EG */
9ods _all_ close;
10 
11/* 3. Utilisation de NWAY et passage d'une variable en BY */
12PROC SUMMARY DATA=&FName. nway;
13 BY broker_number; /* Traitement séquentiel, moins de mémoire */
14 class year month type3;
15 var nbu_pol nbu_prem;
16 OUTPUT out=new_business_summary (drop=_freq_) sum=;
17RUN;
18 
19/* 4. Réactivation de l'affichage pour voir le résultat final si nécessaire */
20ods listing;
21PROC PRINT DATA=new_business_summary;
22RUN;

In summary, to unblock a PROC SUMMARY in SAS© EG: remove the display (PRINT), use NWAY, and intelligently manage your sorts to relieve memory pressure.