This article explores how to perform this transformation, calculate sums robustly, and apply complex sorting logic.
The Scenario
Let's imagine a source table named Population1 containing demographic data by nation and gender:
The objective is to obtain a table Population2 structured as follows, with a column for the total and separate columns for each gender:
Step 1: Data Transposition
Although it is technically possible to use a complex DATA step with RETAIN statements and conditions, the most efficient and least error-prone method relies on the TRANSPOSE procedure.
Before transposing, it is imperative to sort the data by the grouping variable (here, Nation).
/* 1. Tri préalable indispensable */
proc sort data = population1;
by Nation;
run;
/* 2. Transposition */
proc transpose data = population1 out = temp_pop(drop=_name_);
by Nation; /* La ligne devient unique par Nation */
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
run;
1
/* 1. Tri préalable indispensable */
2
PROC SORTDATA = population1;
3
BY Nation;
4
RUN;
5
6
/* 2. Transposition */
7
PROC TRANSPOSEDATA = population1 out = temp_pop(drop=_name_);
8
BY Nation; /* La ligne devient unique par Nation */
9
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
10
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
11
RUN;
At this stage, the table contains the Nation, Men, and Women columns.
Step 2: Calculating the Total and Handling Missing Values
If one of the two variables (Men or Women) contains a missing value, the result of the operation will also be missing.
The recommended approach
Population = sum(Men, Women);
1
Population = sum(Men, Women);
The SUM() function ignores missing values (treating them as zeros). If data is available for men but not for women, the total will still show the figure for men instead of a null value, which is generally the desired behavior.
Step 3: Custom Sort (Conditional Logic)
Now let's assume a specific sorting constraint:
Sort the nations by descending Population.
Exception: A specific nation (e.g., "MyCountry") must always appear in the last position, regardless of its population figure.
To achieve this without modifying the displayed data, the trick is to create a temporary sorting variable (or "rank").
The Logic
We assign the population value to this sorting variable for all nations, except for the special one, to which we assign an artificially low value (for example, -1).
data population2;
set temp_pop;
/* Calcul de la somme robuste */
Population = sum(Men, Women);
/* Création de la variable de tri temporaire */
if Nation = "MyCountry" then SortKey = -1;
else SortKey = Population;
run;
/* Tri final sur la clé temporaire */
proc sort data = population2;
by descending SortKey;
run;
/* Nettoyage final : on retire la clé de tri */
data population2;
set population2(drop=SortKey);
run;
1
DATA population2;
2
SET temp_pop;
3
4
/* Calcul de la somme robuste */
5
Population = sum(Men, Women);
6
7
/* Création de la variable de tri temporaire */
8
IF Nation = "MyCountry"THEN SortKey = -1;
9
ELSE SortKey = Population;
10
RUN;
11
12
/* Tri final sur la clé temporaire */
13
PROC SORTDATA = population2;
14
BY descending SortKey;
15
RUN;
16
17
/* Nettoyage final : on retire la clé de tri */
18
DATA population2;
19
SET population2(drop=SortKey);
20
RUN;
To efficiently transform a table:
Use PROC SORT then PROC TRANSPOSE with an ID statement to pivot rows into columns.
Prefer the SUM() function over the + operator to avoid the propagation of missing values.
For complex sorts (like forcing a row to the bottom of the table), create a temporary technical variable, sort by it, and then drop it.
Avertissement important
Les codes et exemples fournis sur WeAreCAS.eu sont à but pédagogique. Il est impératif de ne pas les copier-coller aveuglément sur vos environnements de production. La meilleure approche consiste à comprendre la logique avant de l'appliquer. Nous vous recommandons vivement de tester ces scripts dans un environnement de test (Sandbox/Dev). WeAreCAS décline toute responsabilité quant aux éventuels impacts ou pertes de données sur vos systèmes.
SAS et tous les autres noms de produits ou de services de SAS Institute Inc. sont des marques déposées ou des marques de commerce de SAS Institute Inc. aux États-Unis et dans d'autres pays. ® indique un enregistrement aux États-Unis. WeAreCAS est un site communautaire indépendant et n'est pas affilié à SAS Institute Inc.
Ce site utilise des cookies techniques et analytiques pour améliorer votre expérience.
En savoir plus.