It often happens that a SAS© program runs quickly directly on a Unix server, but seems to freeze or take an infinite amount of time when launched from the SAS© Enterprise Guide (EG) interface. This phenomenon is often seen when using aggregation procedures like PROC SUMMARY on large volumes of data.
If you find that your sorting step (PROC SORT) executes normally, but the statistical summary "grinds" indefinitely, the problem is not necessarily computational power, but output management.
Here are several strategies to diagnose slowdowns and drastically optimize your code.
Before optimizing, it is useful to understand where resources are being consumed. Adding the FULLSTIMER option at the beginning of your program allows for detailed statistics on CPU time, real time, and memory usage to be displayed in the log.
The most common cause of slowdowns in SAS© EG is the attempt to display results.
When you use the PRINT option in a PROC SUMMARY, SAS© tries to generate a report (often in HTML by default in EG). If your classification variables (CLASS) have high cardinality (many unique combinations), the resulting table is gigantic. SAS© EG then consumes an enormous amount of memory and time to transfer and render this display, giving the impression that the calculation is stuck.
The solution:
A common practice is to generate all statistics and then filter the output table to keep only the finest level of aggregation (where all class variables are taken into account) via a WHERE _TYPE_ = ... clause.
However, it is much more efficient to use the NWAY option. This option instructs SAS© to calculate and write statistics only for the combination of all class variables, thus avoiding the unnecessary calculation of intermediate subtotals.
Before (less efficient):
If your table is very large, the CLASS statement can be memory-intensive because it has to keep the combinations in memory.
An optimization technique is to first sort your data, then use a BY statement instead of (or in addition to) the CLASS statement in the PROC SUMMARY. Group processing (BY) is sequential and consumes less RAM.
Since you often perform a PROC SORT beforehand, you just need to add one of the grouping variables to this sort.
Optimized Code Example
Here is what the final code looks like, incorporating all these improvements:
In summary, to unblock a PROC SUMMARY in SAS© EG: remove the display (PRINT), use NWAY, and intelligently manage your sorts to relieve memory pressure.