This article explores how to perform this transformation, calculate sums robustly, and apply complex sorting logic.
The Scenario
Let's imagine a source table named Population1 containing demographic data by nation and gender:
The objective is to obtain a table Population2 structured as follows, with a column for the total and separate columns for each gender:
Step 1: Data Transposition
Although it is technically possible to use a complex DATA step with RETAIN statements and conditions, the most efficient and least error-prone method relies on the TRANSPOSE procedure.
Before transposing, it is imperative to sort the data by the grouping variable (here, Nation).
/* 1. Tri préalable indispensable */
proc sort data = population1;
by Nation;
run;
/* 2. Transposition */
proc transpose data = population1 out = temp_pop(drop=_name_);
by Nation; /* La ligne devient unique par Nation */
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
run;
1
/* 1. Tri préalable indispensable */
2
PROC SORTDATA = population1;
3
BY Nation;
4
RUN;
5
6
/* 2. Transposition */
7
PROC TRANSPOSEDATA = population1 out = temp_pop(drop=_name_);
8
BY Nation; /* La ligne devient unique par Nation */
9
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
10
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
11
RUN;
At this stage, the table contains the Nation, Men, and Women columns.
Step 2: Calculating the Total and Handling Missing Values
If one of the two variables (Men or Women) contains a missing value, the result of the operation will also be missing.
The recommended approach
Population = sum(Men, Women);
1
Population = sum(Men, Women);
The SUM() function ignores missing values (treating them as zeros). If data is available for men but not for women, the total will still show the figure for men instead of a null value, which is generally the desired behavior.
Step 3: Custom Sort (Conditional Logic)
Now let's assume a specific sorting constraint:
Sort the nations by descending Population.
Exception: A specific nation (e.g., "MyCountry") must always appear in the last position, regardless of its population figure.
To achieve this without modifying the displayed data, the trick is to create a temporary sorting variable (or "rank").
The Logic
We assign the population value to this sorting variable for all nations, except for the special one, to which we assign an artificially low value (for example, -1).
data population2;
set temp_pop;
/* Calcul de la somme robuste */
Population = sum(Men, Women);
/* Création de la variable de tri temporaire */
if Nation = "MyCountry" then SortKey = -1;
else SortKey = Population;
run;
/* Tri final sur la clé temporaire */
proc sort data = population2;
by descending SortKey;
run;
/* Nettoyage final : on retire la clé de tri */
data population2;
set population2(drop=SortKey);
run;
1
DATA population2;
2
SET temp_pop;
3
4
/* Calcul de la somme robuste */
5
Population = sum(Men, Women);
6
7
/* Création de la variable de tri temporaire */
8
IF Nation = "MyCountry"THEN SortKey = -1;
9
ELSE SortKey = Population;
10
RUN;
11
12
/* Tri final sur la clé temporaire */
13
PROC SORTDATA = population2;
14
BY descending SortKey;
15
RUN;
16
17
/* Nettoyage final : on retire la clé de tri */
18
DATA population2;
19
SET population2(drop=SortKey);
20
RUN;
To efficiently transform a table:
Use PROC SORT then PROC TRANSPOSE with an ID statement to pivot rows into columns.
Prefer the SUM() function over the + operator to avoid the propagation of missing values.
For complex sorts (like forcing a row to the bottom of the table), create a temporary technical variable, sort by it, and then drop it.
Wichtiger Haftungsausschluss
Die auf WeAreCAS.eu bereitgestellten Codes und Beispiele dienen Lehrzwecken. Es ist zwingend erforderlich, sie nicht blind in Ihre Produktionsumgebungen zu kopieren. Der beste Ansatz besteht darin, die Logik zu verstehen, bevor sie angewendet wird. Wir empfehlen dringend, diese Skripte in einer Testumgebung (Sandbox/Dev) zu testen. WeAreCAS übernimmt keine Verantwortung für mögliche Auswirkungen oder Datenverluste auf Ihren Systemen.
SAS und alle anderen Produkt- oder Dienstleistungsnamen von SAS Institute Inc. sind eingetragene Marken oder Marken von SAS Institute Inc. in den USA und anderen Ländern. ® zeigt die Registrierung in den USA an. WeAreCAS ist eine unabhängige Community-Site und nicht mit SAS Institute Inc. verbunden.
Diese Website verwendet technische und analytische Cookies, um Ihre Erfahrung zu verbessern.
Mehr erfahren.