Published on :
Combine Data CREATION_INTERNE

Examples: Interleaving Data

This code is also available in: Deutsch Español Français
Awaiting validation
Interleaving is a data combination technique where observations from multiple datasets are interlaced into a new dataset. To interleave datasets, they must first be sorted or indexed by the specified BY variables. The process sequentially scans each dataset, copying observations into the new dataset while respecting the order of the BY variables. Missing values are generated for variables present in one dataset but absent in another. It is crucial to note the order in which datasets are listed in the SET statement, as this determines the order of observations when BY variable values are duplicated. The examples below illustrate interleaving with unique BY values, duplicated BY values, and different BY values between input datasets.
Data Analysis

Type : CREATION_INTERNE


The examples use generated data (datalines) to create the input datasets 'animal', 'plant', 'animalDupes', and 'plantMissing2'. SASHELP is not used.

1 Code Block
DATA STEP / PROC SORT Data
Explanation :
This program creates two datasets, 'animal' and 'plant', then sorts them by the 'common' variable. Afterwards, it interleaves these datasets into a new dataset called 'interleave'. The observations in the 'interleave' dataset are organized alternately according to the value of the 'common' variable from both 'animal' and 'plant' datasets. The final result is displayed by PROC PRINT, showing all variables from both input datasets with missing values if a variable is not present in one of the original datasets.
Copied!
1DATA animal;
2 INPUT common $ animal$;
3 DATALINES;
4a Ant
5b Bird
6c Cat
7d Dog
8e Eagle
9f Frog
10;
11RUN;
12 
13DATA plant;
14 INPUT common $ plant$;
15 DATALINES;
16a Apple
17b Banana
18c Coconut
19d Dewberry
20e Eggplant
21f Fig
22;
23RUN;
24 
25PROC SORT DATA=animal; BY common; RUN;
26PROC SORT DATA=plant; BY common; RUN;
27 
28DATA interleave;
29 SET animal plant;
30 BY common;
31RUN;
32PROC PRINT DATA=interleave; RUN;
2 Code Block
DATA STEP / PROC SORT Data
Explanation :
This example creates two datasets, 'animalDupes' and 'plantDupes', containing duplicate values for the 'common' variable. After sorting by 'common', the SET statement interleaves the observations. Given the order 'animalDupes' then 'plantDupes' in the SET statement, for duplicate 'common' values, observations from 'animalDupes' will appear before those from 'plantDupes'. The program then prints the 'interleave' dataset, which contains all observations from both input datasets, interleaved by the 'common' variable.
Copied!
1DATA animalDupes;
2 INPUT common $ animal$;
3 DATALINES;
4a Ant
5a Ape
6b Bird
7c Cat
8d Dog
9e Eagle
10;
11RUN;
12 
13DATA plantDupes;
14 INPUT common $ plant$;
15 DATALINES;
16a Apple
17b Banana
18c Coconut
19c Celery
20d Dewberry
21e Eggplant
22;
23RUN;
24 
25PROC SORT DATA=animalDupes; BY common; RUN;
26PROC SORT DATA=plantDupes; BY common; RUN;
27 
28DATA interleave;
29 SET animalDupes plantDupes;
30 BY common;
31RUN;
32 
33PROC PRINT DATA=interleave; RUN;
3 Code Block
DATA STEP / PROC SORT Data
Explanation :
This example is similar to the previous one, but the order of the datasets in the SET statement is reversed ('plantDupes' followed by 'animalDupes'). This demonstrates that the order in the SET statement determines priority when BY variable values are duplicated. For duplicate 'common' values, observations from 'plantDupes' will be listed before those from 'animalDupes'. The program prints the 'interleave' dataset.
Copied!
1DATA animalDupes;
2 INPUT common $ animal$;
3 DATALINES;
4a Ant
5a Ape
6b Bird
7c Cat
8d Dog
9e Eagle
10;
11RUN;
12 
13DATA plantDupes;
14 INPUT common $ plant$;
15 DATALINES;
16a Apple
17b Banana
18c Coconut
19c Celery
20d Dewberry
21e Eggplant
22;
23RUN;
24 
25PROC SORT DATA=animalDupes; BY common; RUN;
26PROC SORT DATA=plantDupes; BY common; RUN;
27 
28DATA interleave;
29 SET plantDupes animalDupes; BY common;
30RUN;
31PROC PRINT DATA=interleave; RUN;
4 Code Block
DATA STEP / PROC SORT Data
Explanation :
This program interleaves the 'animalDupes' and 'plantMissing2' datasets, where both datasets contain values for the 'common' variable that are not present in the other (e.g., 'd' in 'animalDupes' and 'f' in 'plantMissing2'). After sorting, the datasets are interleaved. The resulting 'interleave' dataset will include all observations from both input datasets, creating missing values for variables not found in the other dataset for a given 'common' value. The program prints the 'interleave' dataset.
Copied!
1DATA animalDupes;
2 INPUT common $ animal$;
3 DATALINES;
4a Ant
5a Ape
6b Bird
7c Cat
8d Dog
9e Eagle
10;
11RUN;
12 
13DATA plantMissing2;
14 INPUT common $ plant$;
15 DATALINES;
16a Apple
17b Banana
18c Coconut
19e Eggplant
20f Fig
21;
22RUN;
23 
24PROC SORT DATA=animalDupes; BY common; RUN;
25PROC SORT DATA=plantMissing2; BY common; RUN;
26 
27DATA interleave;
28 SET animalDupes plantMissing2;
29 BY common;
30RUN;
31 
32PROC PRINT DATA=interleave; RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved