This educational script illustrates two fundamental data manipulation techniques. First, it transforms a 'wide' dataset containing baby weights at different months (weight3, weight6, etc. columns) into a 'long' format (one row per month per baby) using arrays (ARRAY) and an iterative loop in a DATA step. Second, it shows how to revert to the 'wide' format using the TRANSPOSE procedure, detailing the successive steps to refine the result (using BY, ID, and PREFIX).
Data Analysis
Type : CREATION_INTERNE
The 'baby_weight' data is statically generated in the script via the DATALINES statement.
1 Code Block
DATA STEP Data
Explanation : Creation of the initial 'baby_weight' dataset containing identifiers, gender, and weights measured at 3, 6, 9, and 12 months.
Copied!
data baby_weight;
input id gender $ weight3 weight6 weight9 weight12;
datalines;
1001 F 9 13 16 17
1002 F 11 16 17 20
1003 M 17 20 23 24
1004 F 16 18 21 22
1005 M 11 15 16 18
1006 M 17 21 25 26
1007 M 16 17 19 21
1008 F 15 16 18 19
;
run;
1
DATA baby_weight;
2
INPUT id gender $ weight3 weight6 weight9 weight12;
3
DATALINES;
4
1001 F 9131617
5
1002 F 11161720
6
1003 M 17202324
7
1004 F 16182122
8
1005 M 11151618
9
1006 M 17212526
10
1007 M 16171921
11
1008 F 15161819
12
;
13
RUN;
2 Code Block
PROC PRINT
Explanation : Display of raw data for verification.
Copied!
proc print data = baby_weight;
run;
1
PROC PRINTDATA = baby_weight;
2
RUN;
3 Code Block
DATA STEP Data
Explanation : 'Wide to Long' transformation. Uses two arrays (ARRAY): one for fixed months and one for weight variables. A DO loop iterates over the arrays and the OUTPUT statement generates a row for each month.
Copied!
data baby_weight_long (keep = id gender month weight);
set baby_weight;
array months{4} (3 6 9 12);
array weights{4} weight3--weight12;
do i = 1 to 4;
month = months{i};
weight = weights{i};
output;
end;
run;
1
DATA baby_weight_long (keep = id gender month weight);
2
SET baby_weight;
3
array months{4} (36912);
4
array weights{4} weight3--weight12;
5
DO i = 1 to 4;
6
month = months{i};
7
weight = weights{i};
8
OUTPUT;
9
END;
10
RUN;
4 Code Block
PROC PRINT
Explanation : Visualization of the restructured long format result.
Copied!
proc print data = baby_weight_long;
run;
1
PROC PRINTDATA = baby_weight_long;
2
RUN;
5 Code Block
DATA STEP Data
Explanation : Pedagogical example (intermediate or incorrect step for the final goal) showing iteration without the explicit OUTPUT statement at each iteration, which does not produce the desired 'long' structure for all observations.
Copied!
data baby_weight_long1;
set baby_weight;
array weights{4} weight3--weight12;
do i = 1 to 4;
weight = weights{i};
end;
run;
1
DATA baby_weight_long1;
2
SET baby_weight;
3
array weights{4} weight3--weight12;
4
DO i = 1 to 4;
5
weight = weights{i};
6
END;
7
RUN;
6 Code Block
PROC PRINT
Explanation : Visualization of the intermediate step.
Copied!
proc print data = baby_weight_long1;
run;
1
PROC PRINTDATA = baby_weight_long1;
2
RUN;
7 Code Block
PROC SORT Data
Explanation : Sorting 'long' data by identifier and month, a necessary prerequisite for correct or structured transposition.
Copied!
proc sort data = baby_weight_long;
by id month;
run;
1
2
PROC SORT
3
DATA = baby_weight_long;
4
BY id month;
5
RUN;
6
8 Code Block
PROC TRANSPOSE Data
Explanation : Transposition Step 1: Basic attempt at transposition without grouping variables (BY), which transposes the entire table.
Copied!
proc transpose data = baby_weight_long
out = baby_weight_wide;
run;
1
2
PROC TRANSPOSE
3
DATA = baby_weight_long
4
out = baby_weight_wide;
5
RUN;
6
9 Code Block
PROC TRANSPOSE Data
Explanation : Transposition Step 2: Adding the BY 'id' statement to transpose the data for each subject separately.
Copied!
proc transpose data = baby_weight_long
out = baby_weight_wide;
by id;
run;
1
PROC TRANSPOSEDATA = baby_weight_long
2
out = baby_weight_wide;
3
BY id;
4
RUN;
10 Code Block
PROC TRANSPOSE Data
Explanation : Transposition Step 3: Adding the ID 'month' statement to use the values of the 'month' variable as suffixes/names for the new transposed columns.
Copied!
proc transpose data = baby_weight_long
out = baby_weight_wide;
by id;
id month;
run;
1
PROC TRANSPOSEDATA = baby_weight_long
2
out = baby_weight_wide;
3
BY id;
4
id month;
5
RUN;
11 Code Block
PROC TRANSPOSE Data
Explanation : Step 4 (Final): Refinement of the transposition with the PREFIX='weight' option to name the columns (e.g., weight3, weight6...) and deletion of the automatic _NAME_ column.
Copied!
proc transpose data = baby_weight_long
out = baby_weight_wide (drop = _NAME_)
prefix = weight;
by id;
id month;
run;
1
PROC TRANSPOSEDATA = baby_weight_long
2
out = baby_weight_wide (drop = _NAME_)
3
prefix = weight;
4
BY id;
5
id month;
6
RUN;
12 Code Block
PROC PRINT
Explanation : Visualization of the final transposition result (back to wide format).
Copied!
proc print data = baby_weight_wide;
run;
1
PROC PRINTDATA = baby_weight_wide;
2
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.