Published on :
ETL INTERNAL_CREATION

Demonstration: Concatenation of SAS Datasets

This code is also available in: Deutsch Español Français
Awaiting validation
The script begins by creating three datasets (advisees_MPH, advisees_DrPH, advisees_MHA) via DATA step blocks with DATALINES statements to populate the data. It then illustrates the simple concatenation of datasets with identical variables (advisees_MPH and advisees_DrPH into advisees). A second demonstration shows the concatenation of datasets with different variable names (advisees_MPH and advisees_MHA into advisees_Masters), requiring the use of the RENAME option in the SET statement to harmonize the column names 'degree' and 'program'.
Data Analysis

Type : INTERNAL_CREATION


Source data is created directly within the script using DATA step blocks and DATALINES statements.

1 Code Block
DATA STEP Data
Explanation :
Creation of the 'advisees_MPH' dataset with 'first', 'gender', and 'program' variables. Data is entered via DATALINES statements.
Copied!
1DATA advisees_MPH;
2 INPUT first $ gender $ program $;
3 DATALINES;
4 Alison F MPH
5 Ming F MPH
6RUN;
2 Code Block
DATA STEP Data
Explanation :
Creation of the 'advisees_DrPH' dataset with the same variables as 'advisees_MPH'. Data is entered via DATALINES statements.
Copied!
1DATA advisees_DrPH;
2 INPUT first $ gender $ program $;
3 DATALINES;
4 Tiffany F DrPH
5 Florence F DrPH
6RUN;
3 Code Block
DATA STEP
Explanation :
Concatenation of 'advisees_MPH' and 'advisees_DrPH' datasets into a new 'advisees' dataset. Since the variables are identical, the merge is direct.
Copied!
1 
2DATA advisees;
3SET advisees_MPH advisees_DrPH;
4RUN;
5 
4 Code Block
PROC PRINT
Explanation :
Displays the content of the 'advisees' dataset, resulting from the first concatenation.
Copied!
1PROC PRINT DATA = advisees;
2RUN;
5 Code Block
DATA STEP Data
Explanation :
Creation of the 'advisees_MHA' dataset with 'first', 'gender', and 'degree' variables. The 'degree' variable is intentionally different from 'program' in the previous datasets.
Copied!
1DATA advisees_MHA;
2 INPUT first $ gender $ degree $;
3 DATALINES;
4 Jessica F MHA
5 Ryan M MHA
6RUN;
6 Code Block
DATA STEP
Explanation :
Attempt to concatenate 'advisees_MPH' and 'advisees_MHA' datasets. Due to different variable names ('program' and 'degree'), corresponding values will be missing in the resulting dataset where the variable does not exist in the source dataset.
Copied!
1 
2DATA advisees_Masters;
3SET advisees_MPH advisees_MHA;
4RUN;
5 
7 Code Block
PROC PRINT
Explanation :
Displays the content of the 'advisees_Masters' dataset after concatenation without renaming, showing missing values due to different variable names.
Copied!
1PROC PRINT DATA = advisees_Masters;
2RUN;
8 Code Block
DATA STEP
Explanation :
Re-concatenation of 'advisees_MPH' and 'advisees_MHA' datasets. The RENAME option is used to temporarily rename the 'degree' variable from 'advisees_MHA' to 'program' at the time of reading, allowing for correct data concatenation into a single 'program' variable.
Copied!
1 
2DATA advisees_Masters;
3SET advisees_MPH advisees_MHA (rename = (degree = program));
4RUN;
5 
9 Code Block
PROC PRINT
Explanation :
Displays the final content of the 'advisees_Masters' dataset, demonstrating successful concatenation through the use of the RENAME option.
Copied!
1PROC PRINT DATA = advisees_Masters;
2RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.