SAS9

SAS Tutorial: Mastering Data Sorting with PROC SORT

Simon 22 Aufrufe

Sorting is an essential operation in database management. This tutorial explains how to use the SAS© SORT procedure to organize your records, whether for preparing a report or for cleaning data before analysis.

SAS Tutorial: Mastering Data Sorting with PROC SORT -

To begin: The "Box Office" dataset

To make the exercise more concrete, we will work with a movie database. Let's imagine a table containing the following information:

  • Title: The name of the movie.

  • Director: The name of the director.

  • Studio: The production company.

  • Budget_M: The film's budget in millions of dollars.

  • Year: The year of release.

SAS Tutorial: Mastering Data Sorting with PROC SORT -

Here is the code to generate this example dataset:

1DATA movies;
2 INFILE DATALINES dsd;
3 INPUT Title : $20. Director : $15. Studio : $15. Budget_M Year;
4 DATALINES;
5Inception,Nolan,Warner Bros,160,2010
6Tenet,Nolan,Warner Bros,200,2020
7Dune,Villeneuve,Warner Bros,165,2021
8Jaws,Spielberg,Universal,9,1975
9E.T.,Spielberg,Universal,10,1982
10Blade Runner 2049,Villeneuve,Warner Bros,150,2017
11Avatar,Cameron,Fox,237,2009
12Titanic,Cameron,Paramount,200,1997
13The Room,Wiseau,,6,2003
14.,Unknown,Indie,1,2020
15Metropolis,Lang,UFA,.,1927
16;
17RUN;
Note that we have intentionally included missing values (a missing studio for "The Room", a missing budget for "Metropolis", and a missing title).

Sorting data with PROC SORT

To reverse the order (for example, to see the most recent films first), we use the DESCENDING keyword. Be careful, this keyword is placed before the variable it modifies.

Example: Sort by Director (Z to A) then by Budget (from most expensive to least expensive).

1 
2PROC SORT
3DATA=movies out=sorted_movies;
4 
5BY descending Director descending Budget_M;
6 
7RUN;
8 
If you forget the second DESCENDING before Budget_M, SAS© will sort the budgets in ascending order by default, even if the directors are reversed.

Reminder: There is no "ASCENDING" keyword. If you write it, SAS© will return an error.
SAS Tutorial: Mastering Data Sorting with PROC SORT -

Preparing data for FIRST. and LAST. processing

One of the most powerful uses of sorting is to prepare the data for the DATA step, in order to keep only "the best" or "the first" of a group.

Let's imagine we want to keep only the most expensive film for each director.

  1. We sort by Director.

  2. We sort by Budget in descending order (so that the largest budget is at the top of each group).

1PROC SORT DATA=movies out=sorted_movies;
2 BY Director descending Budget_M;
3RUN;
4 
5DATA big_budget_only;
6 SET sorted_movies;
7 BY Director;
8 IF first.Director THEN OUTPUT;
9RUN;
The `if first.Director` code allows us to capture the first line for each new director encountered. Since we have sorted the budgets in descending order, this first line automatically corresponds to the most expensive film.

Comparison: PROC SORT vs PROC SQL

Developers accustomed to SQL (Structured Query Language) must be careful with the syntax, as it is the reverse of SAS©.

In SAS© (PROC SORT): DESCENDING is placed before the variable. In SQL (ORDER BY): DESC is placed after the variable.

Here is the SQL equivalent:

1PROC SQL;
2 SELECT * FROM movies
3 ORDER BY Director DESC, Budget_M DESC;
4QUIT;
Mnemonic: "SORT before, SQL after".
SAS Tutorial: Mastering Data Sorting with PROC SORT -

Sorting is an operation that consumes memory and processor time. If your movie data already comes from an SQL or Oracle server, it is often better to let the database server handle the sorting via its own indexing rather than importing the data in disorder to sort it later with SAS©.