This article explores how to perform this transformation, calculate sums robustly, and apply complex sorting logic.
The Scenario
Let's imagine a source table named Population1 containing demographic data by nation and gender:
The objective is to obtain a table Population2 structured as follows, with a column for the total and separate columns for each gender:
Step 1: Data Transposition
Although it is technically possible to use a complex DATA step with RETAIN statements and conditions, the most efficient and least error-prone method relies on the TRANSPOSE procedure.
Before transposing, it is imperative to sort the data by the grouping variable (here, Nation).
/* 1. Tri préalable indispensable */
proc sort data = population1;
by Nation;
run;
/* 2. Transposition */
proc transpose data = population1 out = temp_pop(drop=_name_);
by Nation; /* La ligne devient unique par Nation */
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
run;
1
/* 1. Tri préalable indispensable */
2
PROC SORTDATA = population1;
3
BY Nation;
4
RUN;
5
6
/* 2. Transposition */
7
PROC TRANSPOSEDATA = population1 out = temp_pop(drop=_name_);
8
BY Nation; /* La ligne devient unique par Nation */
9
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
10
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
11
RUN;
At this stage, the table contains the Nation, Men, and Women columns.
Step 2: Calculating the Total and Handling Missing Values
If one of the two variables (Men or Women) contains a missing value, the result of the operation will also be missing.
The recommended approach
Population = sum(Men, Women);
1
Population = sum(Men, Women);
The SUM() function ignores missing values (treating them as zeros). If data is available for men but not for women, the total will still show the figure for men instead of a null value, which is generally the desired behavior.
Step 3: Custom Sort (Conditional Logic)
Now let's assume a specific sorting constraint:
Sort the nations by descending Population.
Exception: A specific nation (e.g., "MyCountry") must always appear in the last position, regardless of its population figure.
To achieve this without modifying the displayed data, the trick is to create a temporary sorting variable (or "rank").
The Logic
We assign the population value to this sorting variable for all nations, except for the special one, to which we assign an artificially low value (for example, -1).
data population2;
set temp_pop;
/* Calcul de la somme robuste */
Population = sum(Men, Women);
/* Création de la variable de tri temporaire */
if Nation = "MyCountry" then SortKey = -1;
else SortKey = Population;
run;
/* Tri final sur la clé temporaire */
proc sort data = population2;
by descending SortKey;
run;
/* Nettoyage final : on retire la clé de tri */
data population2;
set population2(drop=SortKey);
run;
1
DATA population2;
2
SET temp_pop;
3
4
/* Calcul de la somme robuste */
5
Population = sum(Men, Women);
6
7
/* Création de la variable de tri temporaire */
8
IF Nation = "MyCountry"THEN SortKey = -1;
9
ELSE SortKey = Population;
10
RUN;
11
12
/* Tri final sur la clé temporaire */
13
PROC SORTDATA = population2;
14
BY descending SortKey;
15
RUN;
16
17
/* Nettoyage final : on retire la clé de tri */
18
DATA population2;
19
SET population2(drop=SortKey);
20
RUN;
To efficiently transform a table:
Use PROC SORT then PROC TRANSPOSE with an ID statement to pivot rows into columns.
Prefer the SUM() function over the + operator to avoid the propagation of missing values.
For complex sorts (like forcing a row to the bottom of the table), create a temporary technical variable, sort by it, and then drop it.
Important Disclaimer
The codes and examples provided on WeAreCAS.eu are for educational purposes. It is imperative not to blindly copy-paste them into your production environments. The best approach is to understand the logic before applying it. We strongly recommend testing these scripts in a test environment (Sandbox/Dev). WeAreCAS accepts no responsibility for any impact or data loss on your systems.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.