This article explores how to perform this transformation, calculate sums robustly, and apply complex sorting logic.
The Scenario
Let's imagine a source table named Population1 containing demographic data by nation and gender:
The objective is to obtain a table Population2 structured as follows, with a column for the total and separate columns for each gender:
Step 1: Data Transposition
Although it is technically possible to use a complex DATA step with RETAIN statements and conditions, the most efficient and least error-prone method relies on the TRANSPOSE procedure.
Before transposing, it is imperative to sort the data by the grouping variable (here, Nation).
/* 1. Tri préalable indispensable */
proc sort data = population1;
by Nation;
run;
/* 2. Transposition */
proc transpose data = population1 out = temp_pop(drop=_name_);
by Nation; /* La ligne devient unique par Nation */
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
run;
1
/* 1. Tri préalable indispensable */
2
PROC SORTDATA = population1;
3
BY Nation;
4
RUN;
5
6
/* 2. Transposition */
7
PROC TRANSPOSEDATA = population1 out = temp_pop(drop=_name_);
8
BY Nation; /* La ligne devient unique par Nation */
9
id Gender; /* Les valeurs de Gender deviennent les noms de colonnes */
10
var Number; /* Ce sont ces chiffres qui remplissent les colonnes */
11
RUN;
At this stage, the table contains the Nation, Men, and Women columns.
Step 2: Calculating the Total and Handling Missing Values
If one of the two variables (Men or Women) contains a missing value, the result of the operation will also be missing.
The recommended approach
Population = sum(Men, Women);
1
Population = sum(Men, Women);
The SUM() function ignores missing values (treating them as zeros). If data is available for men but not for women, the total will still show the figure for men instead of a null value, which is generally the desired behavior.
Step 3: Custom Sort (Conditional Logic)
Now let's assume a specific sorting constraint:
Sort the nations by descending Population.
Exception: A specific nation (e.g., "MyCountry") must always appear in the last position, regardless of its population figure.
To achieve this without modifying the displayed data, the trick is to create a temporary sorting variable (or "rank").
The Logic
We assign the population value to this sorting variable for all nations, except for the special one, to which we assign an artificially low value (for example, -1).
data population2;
set temp_pop;
/* Calcul de la somme robuste */
Population = sum(Men, Women);
/* Création de la variable de tri temporaire */
if Nation = "MyCountry" then SortKey = -1;
else SortKey = Population;
run;
/* Tri final sur la clé temporaire */
proc sort data = population2;
by descending SortKey;
run;
/* Nettoyage final : on retire la clé de tri */
data population2;
set population2(drop=SortKey);
run;
1
DATA population2;
2
SET temp_pop;
3
4
/* Calcul de la somme robuste */
5
Population = sum(Men, Women);
6
7
/* Création de la variable de tri temporaire */
8
IF Nation = "MyCountry"THEN SortKey = -1;
9
ELSE SortKey = Population;
10
RUN;
11
12
/* Tri final sur la clé temporaire */
13
PROC SORTDATA = population2;
14
BY descending SortKey;
15
RUN;
16
17
/* Nettoyage final : on retire la clé de tri */
18
DATA population2;
19
SET population2(drop=SortKey);
20
RUN;
To efficiently transform a table:
Use PROC SORT then PROC TRANSPOSE with an ID statement to pivot rows into columns.
Prefer the SUM() function over the + operator to avoid the propagation of missing values.
For complex sorts (like forcing a row to the bottom of the table), create a temporary technical variable, sort by it, and then drop it.
Aviso importante
Los códigos y ejemplos proporcionados en WeAreCAS.eu son con fines educativos. Es imperativo no copiarlos y pegarlos ciegamente en sus entornos de producción. El mejor enfoque es comprender la lógica antes de aplicarla. Recomendamos encarecidamente probar estos scripts en un entorno de prueba (Sandbox/Dev). WeAreCAS no acepta ninguna responsabilidad por cualquier impacto o pérdida de datos en sus sistemas.
SAS y todos los demás nombres de productos o servicios de SAS Institute Inc. son marcas registradas o marcas comerciales de SAS Institute Inc. en los EE. UU. y otros países. ® indica registro en los EE. UU. WeAreCAS es un sitio comunitario independiente y no está afiliado a SAS Institute Inc.
Este sitio utiliza cookies técnicas y analíticas para mejorar su experiencia.
Saber más.