Examples use generated data (datalines) or SASHELP.
1 Code Block
DATA STEP Data
Explanation : This example merges two datasets, 'animal' and 'plantG', which have an equal number of observations and a common variable named 'common'. The MERGE statement is used without the BY statement, meaning that observations are merged based on their position (row number). The value 'g' of the 'common' variable in 'plantG' replaces the value 'f' of 'common' in 'animal' for observation 6 of the output dataset, demonstrating that values from the last named dataset overwrite previous ones for common variables.
Copied!
data animal;
input common $ animal $;
datalines;
a Ant
b Bird
c Cat
d Dog
e Eagle
f Frog
;
data plantG;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
d Dewberry
e Eggplant
g Fig
;
data merged;
merge animal plantG;
run;
proc print data=merged;
run;
1
DATA animal;
2
INPUT common $ animal $;
3
DATALINES;
4
a Ant
5
b Bird
6
c Cat
7
d Dog
8
e Eagle
9
f Frog
10
;
11
12
DATA plantG;
13
INPUT common $ plant $;
14
DATALINES;
15
a Apple
16
b Banana
17
c Coconut
18
d Dewberry
19
e Eggplant
20
g Fig
21
;
22
23
DATA merged;
24
MERGE animal plantG;
25
RUN;
26
27
PROC PRINTDATA=merged;
28
RUN;
2 Code Block
DATA STEP Data
Explanation : This example merges the 'animal' dataset (6 observations) and 'plantMissing' (3 observations). Without a BY statement, one-to-one merging associates observations by row number. The resulting 'merged' dataset will have 6 observations. For observations 4, 5, and 6, the variables from 'plantMissing' (plant, common) will contain missing values because 'plantMissing' only has 3 observations. The values of 'common' from the last dataset ('plantMissing') overwrite those from 'animal' for the corresponding observations.
Copied!
data animal;
input common $ animal $;
datalines;
a Ant
b Bird
c Cat
d Dog
e Eagle
f Frog
;
data plantMissing;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
;
data merged;
merge animal plantmissing;
run;
proc print data=merged;
run;
1
DATA animal;
2
INPUT common $ animal $;
3
DATALINES;
4
a Ant
5
b Bird
6
c Cat
7
d Dog
8
e Eagle
9
f Frog
10
;
11
12
DATA plantMissing;
13
INPUT common $ plant $;
14
DATALINES;
15
a Apple
16
b Banana
17
c Coconut
18
;
19
20
DATA merged;
21
MERGE animal plantmissing;
22
RUN;
23
PROC PRINTDATA=merged;
24
RUN;
3 Code Block
DATA STEP Data
Explanation : This example shows the difference in behavior between MERGE and SET when combining datasets of unequal sizes without a BY statement. With two consecutive SET statements, the DATA step stops selecting observations for output after reading the last observation from the dataset with the smallest number of observations ('plantMissing' with 3 observations). The resulting 'combine' dataset will therefore only have 3 observations, where the variables from 'animal' will be present but not combined by row with 'plantMissing' beyond the 3rd observation.
Copied!
data animal;
input common $ animal $;
datalines;
a Ant
b Bird
c Cat
d Dog
e Eagle
f Frog
;
data plantMissing;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
;
data combine;
set animal;
set plantMissing;
run;
proc print data=combine;
run;
1
DATA animal;
2
INPUT common $ animal $;
3
DATALINES;
4
a Ant
5
b Bird
6
c Cat
7
d Dog
8
e Eagle
9
f Frog
10
;
11
12
DATA plantMissing;
13
INPUT common $ plant $;
14
DATALINES;
15
a Apple
16
b Banana
17
c Coconut
18
;
19
20
DATA combine;
21
SET animal;
22
SET plantMissing;
23
RUN;
24
PROC PRINTDATA=combine;
25
RUN;
4 Code Block
DATA STEP Data
Explanation : This example illustrates the undesirable results of a one-to-one merge (without BY) when datasets contain duplicate values for common variables. Since the merge is based on row number and not on 'common' values, duplicates of 'a' in 'animalDupes' and 'c' in 'plantDupes' are not handled as one might expect. The second observation from 'animalDupes' (a Ape) is merged with the second from 'plantDupes' (b Banana), which does not correspond to the common variable values. The result is a dataset where the logical pairing of observations is compromised due to duplications and the absence of the BY statement.
Copied!
data animalDupes;
input common $ animal $;
datalines;
a Ant
a Ape
b Bird
c Cat
d Dog
e Eagle
;
data plantDupes;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
c Celery
d Dewberry
e Eggplant
;
data merged;
merge animalDupes plantDupes;
run;
proc print data=merged;
run;
1
DATA animalDupes;
2
INPUT common $ animal $;
3
DATALINES;
4
a Ant
5
a Ape
6
b Bird
7
c Cat
8
d Dog
9
e Eagle
10
;
11
12
DATA plantDupes;
13
INPUT common $ plant $;
14
DATALINES;
15
a Apple
16
b Banana
17
c Coconut
18
c Celery
19
d Dewberry
20
e Eggplant
21
;
22
23
DATA merged;
24
MERGE animalDupes plantDupes;
25
RUN;
26
PROC PRINTDATA=merged;
27
RUN;
5 Code Block
DATA STEP Data
Explanation : This example demonstrates the undesirable results of a one-to-one merge (without BY) when the input datasets ('animalMissing' and 'plantMissing2') have different values for the common variable ('common'). Since the merge is done by row number, observation 2 from 'animalMissing' (c Cat) is merged with observation 2 from 'plantMissing2' (b Banana), and so on. This leads to a dataset where the values of the 'common' variable and other variables do not logically match, as the pairing is not based on identifier concordance.
Copied!
data animalMissing;
input common $ animal $;
datalines;
a Ant
c Cat
d Dog
e Eagle
;
data plantMissing2;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
e Eggplant
f Fig
;
data merged;
merge animalMissing plantMissing2;
run;
proc print data=merged;
run;
1
DATA animalMissing;
2
INPUT common $ animal $;
3
DATALINES;
4
a Ant
5
c Cat
6
d Dog
7
e Eagle
8
;
9
10
DATA plantMissing2;
11
INPUT common $ plant $;
12
DATALINES;
13
a Apple
14
b Banana
15
c Coconut
16
e Eggplant
17
f Fig
18
;
19
20
DATA merged;
21
MERGE animalMissing plantMissing2;
22
RUN;
23
PROC PRINTDATA=merged;
24
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.