Examples: Interleaving Data

Interleaving is a data combination technique where observations from multiple datasets are interlaced into a new dataset. To interleave datasets, they must first be sorted or indexed by the specified BY variables. The process sequentially scans each dataset, copying observations into the new dataset while respecting the order of the BY variables. Missing values are generated for variables present in one dataset but absent in another. It is crucial to note the order in which datasets are listed in the SET statement, as this determines the order of observations when BY variable values are duplicated. The examples below illustrate interleaving with unique BY values, duplicated BY values, and different BY values between input datasets.

Data Analysis

Type : CREATION_INTERNE

The examples use generated data (datalines) to create the input datasets 'animal', 'plant', 'animalDupes', and 'plantMissing2'. SASHELP is not used.

1 Code Block

DATA STEP / PROC SORT Data

Explanation :
This program creates two datasets, 'animal' and 'plant', then sorts them by the 'common' variable. Afterwards, it interleaves these datasets into a new dataset called 'interleave'. The observations in the 'interleave' dataset are organized alternately according to the value of the 'common' variable from both 'animal' and 'plant' datasets. The final result is displayed by PROC PRINT, showing all variables from both input datasets with missing values if a variable is not present in one of the original datasets.

Copied!

1	DATA animal;
2	INPUT common $ animal$;
3	DATALINES;
4	a Ant
5	b Bird
6	c Cat
7	d Dog
8	e Eagle
9	f Frog
10	;
11	RUN;
12
13	DATA plant;
14	INPUT common $ plant$;
15	DATALINES;
16	a Apple
17	b Banana
18	c Coconut
19	d Dewberry
20	e Eggplant
21	f Fig
22	;
23	RUN;
24
25	PROC SORT DATA=animal; BY common; RUN;
26	PROC SORT DATA=plant; BY common; RUN;
27
28	DATA interleave;
29	SET animal plant;
30	BY common;
31	RUN;
32	PROC PRINT DATA=interleave; RUN;

2 Code Block

DATA STEP / PROC SORT Data

Explanation :
This example creates two datasets, 'animalDupes' and 'plantDupes', containing duplicate values for the 'common' variable. After sorting by 'common', the SET statement interleaves the observations. Given the order 'animalDupes' then 'plantDupes' in the SET statement, for duplicate 'common' values, observations from 'animalDupes' will appear before those from 'plantDupes'. The program then prints the 'interleave' dataset, which contains all observations from both input datasets, interleaved by the 'common' variable.

Copied!

1	DATA animalDupes;
2	INPUT common $ animal$;
3	DATALINES;
4	a Ant
5	a Ape
6	b Bird
7	c Cat
8	d Dog
9	e Eagle
10	;
11	RUN;
12
13	DATA plantDupes;
14	INPUT common $ plant$;
15	DATALINES;
16	a Apple
17	b Banana
18	c Coconut
19	c Celery
20	d Dewberry
21	e Eggplant
22	;
23	RUN;
24
25	PROC SORT DATA=animalDupes; BY common; RUN;
26	PROC SORT DATA=plantDupes; BY common; RUN;
27
28	DATA interleave;
29	SET animalDupes plantDupes;
30	BY common;
31	RUN;
32
33	PROC PRINT DATA=interleave; RUN;

3 Code Block

DATA STEP / PROC SORT Data

Explanation :
This example is similar to the previous one, but the order of the datasets in the SET statement is reversed ('plantDupes' followed by 'animalDupes'). This demonstrates that the order in the SET statement determines priority when BY variable values are duplicated. For duplicate 'common' values, observations from 'plantDupes' will be listed before those from 'animalDupes'. The program prints the 'interleave' dataset.

Copied!

1	DATA animalDupes;
2	INPUT common $ animal$;
3	DATALINES;
4	a Ant
5	a Ape
6	b Bird
7	c Cat
8	d Dog
9	e Eagle
10	;
11	RUN;
12
13	DATA plantDupes;
14	INPUT common $ plant$;
15	DATALINES;
16	a Apple
17	b Banana
18	c Coconut
19	c Celery
20	d Dewberry
21	e Eggplant
22	;
23	RUN;
24
25	PROC SORT DATA=animalDupes; BY common; RUN;
26	PROC SORT DATA=plantDupes; BY common; RUN;
27
28	DATA interleave;
29	SET plantDupes animalDupes; BY common;
30	RUN;
31	PROC PRINT DATA=interleave; RUN;

4 Code Block

DATA STEP / PROC SORT Data

Explanation :
This program interleaves the 'animalDupes' and 'plantMissing2' datasets, where both datasets contain values for the 'common' variable that are not present in the other (e.g., 'd' in 'animalDupes' and 'f' in 'plantMissing2'). After sorting, the datasets are interleaved. The resulting 'interleave' dataset will include all observations from both input datasets, creating missing values for variables not found in the other dataset for a given 'common' value. The program prints the 'interleave' dataset.

Copied!

1	DATA animalDupes;
2	INPUT common $ animal$;
3	DATALINES;
4	a Ant
5	a Ape
6	b Bird
7	c Cat
8	d Dog
9	e Eagle
10	;
11	RUN;
12
13	DATA plantMissing2;
14	INPUT common $ plant$;
15	DATALINES;
16	a Apple
17	b Banana
18	c Coconut
19	e Eggplant
20	f Fig
21	;
22	RUN;
23
24	PROC SORT DATA=animalDupes; BY common; RUN;
25	PROC SORT DATA=plantMissing2; BY common; RUN;
26
27	DATA interleave;
28	SET animalDupes plantMissing2;
29	BY common;
30	RUN;
31
32	PROC PRINT DATA=interleave; RUN;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste