Note: The following tests were performed on a dataset of 160 million rows, hosted on a modest virtual architecture (5 nodes).
Note :
1. Aggregation (Group By): Low Cardinality
When you need to sum variables based on a few groups (e.g., Installation Type and Product Line), the natural reflex is to use PROC MEANS or PROC SUMMARY. In CAS, the optimized equivalent is the simple.summary action.
A common (and often outdated) criticism suggests that in-memory engines struggle when the number of groups explodes. Let's verify this by increasing the complexity.
The leap to CAS requires a learning effort, particularly to master the CASL language and CAS Actions. However, for anyone dealing with large volumes of data (Big Data), the return on investment in terms of processing time is immediate and spectacular.
Important Disclaimer
The codes and examples provided on WeAreCAS.eu are for educational purposes. It is imperative not to blindly copy-paste them into your production environments. The best approach is to understand the logic before applying it. We strongly recommend testing these scripts in a test environment (Sandbox/Dev). WeAreCAS accepts no responsibility for any impact or data loss on your systems.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.