The script begins by defining two temporary datasets, 'company' and 'finance', using 'datalines' blocks for direct data input. The 'company' dataset contains demographic information about employees (Name, Age, Gender), while 'finance' contains financial details (Phone Number, Name, Salary). Each dataset is sorted by 'Name' using PROC SORT, which is an essential preparatory step for merging. The sorted datasets are then printed. The script then merges 'company' and 'finance' into a new dataset 'employeeinfo' based on the common variable 'Name'. Additional PROC PRINT procedures are used to display the merged dataset, as well as to demonstrate the use of the global options 'firstobs' and 'obs', the 'WHERE' clause for filtering, and an attempt to calculate a sum for a categorical variable (which is noted as a potential error). Finally, PROC CONTENTS and PROC DATASETS are used to examine the metadata of the datasets, including 'company'.
Data Analysis
Type : CREATION_INTERNE
Data for the 'company' and 'finance' datasets are created directly within the SAS script via 'datalines' blocks. No external data or SASHELP libraries are used as primary data sources for these datasets.
1 Code Block
DATA STEP Data
Explanation : This DATA STEP block creates the 'company' dataset using in-line data provided in the 'datalines' block. It defines the 'Name' (character), 'Age' (numeric), and 'Gender' (character) variables with their respective column positions. This dataset contains basic employee information.
Copied!
data company;
input
Name $ 1-25
Age 27-28
Gender $ 30;
datalines;
Vincent, Martina 34 F
Phillipon, Marie-Odile 28 F
Gunter, Thomas 27 M
Harbinger, Nicholas 36 M
Benito, Gisela 32 F
Rudelich, Herbert 39 M
Sirignano, Emily 12 F
Morrison, Michael 32 M
;
1
DATA company;
2
INPUT
3
Name $ 1-25
4
Age 27-28
5
Gender $ 30;
6
DATALINES;
7
Vincent, Martina 34 F
8
Phillipon, Marie-Odile 28 F
9
Gunter, Thomas 27 M
10
Harbinger, Nicholas 36 M
11
Benito, Gisela 32 F
12
Rudelich, Herbert 39 M
13
Sirignano, Emily 12 F
14
Morrison, Michael 32 M
15
;
16
2 Code Block
PROC SORT
Explanation : This PROC SORT sorts the 'company' dataset by the 'Name' variable. Sorting is essential for subsequent operations like merging ('MERGE') which typically require input datasets to be sorted by the merge variables ('BY' variables).
Copied!
proc sort data=company;
by Name;
run;
1
PROC SORTDATA=company;
2
BY Name;
3
RUN;
4
3 Code Block
PROC PRINT
Explanation : This PROC PRINT displays the contents of the 'company' dataset in the output window. The 'title' specifies the title of the printed report, making it more descriptive.
Copied!
proc print data=company;
title 'Company';
run;
1
PROC PRINTDATA=company;
2
title 'Company';
3
RUN;
4
4 Code Block
DATA STEP Data
Explanation : This second DATA STEP block creates the 'finance' dataset, similar to the 'company' block, using 'datalines'. It captures the 'Phone_number' (character), 'Name' (character), and 'Salary' (numeric) variables, containing financial information for certain employees.
Copied!
data finance;
input Phone_number $ 1-11
Name $ 13-40
Salary;
datalines;
074-53-9892 Vincent, Martina 35000
776-84-5391 Phillipon, Marie-Odile 29750
929-75-0218 Gunter, Thomas 27500
446-93-2122 Harbinger, Nicholas 33900
228-88-9649 Benito, Gisela 28000
029-46-9261 Rudelich, Herbert 35000
442-21-8075 Sirignano, Emily 5000
;
1
DATA finance;
2
INPUT Phone_number $ 1-11
3
Name $ 13-40
4
Salary;
5
DATALINES;
6
074-53-9892 Vincent, Martina 35000
7
776-84-5391 Phillipon, Marie-Odile 29750
8
929-75-0218 Gunter, Thomas 27500
9
446-93-2122 Harbinger, Nicholas 33900
10
228-88-9649 Benito, Gisela 28000
11
029-46-9261 Rudelich, Herbert 35000
12
442-21-8075 Sirignano, Emily 5000
13
;
14
5 Code Block
PROC SORT
Explanation : Similar to sorting the 'company' dataset, this PROC SORT sorts the 'finance' dataset by the 'Name' variable in preparation for the merge operation.
Copied!
proc sort data=finance;
by Name;
run;
1
PROC SORTDATA=finance;
2
BY Name;
3
RUN;
4
6 Code Block
PROC PRINT
Explanation : This PROC PRINT displays the contents of the 'finance' dataset with the title 'Employee Information', providing a view of individual financial data.
Copied!
proc print data=finance;
title 'Employee Information';
run;
1
PROC PRINTDATA=finance;
2
title 'Employee Information';
3
RUN;
4
7 Code Block
DATA STEP Data
Explanation : This DATA STEP merges the 'company' and 'finance' datasets into a new dataset called 'employeeinfo'. The 'BY name;' clause indicates that the merge should be performed by matching observations based on the 'Name' variable. If a name appears in one dataset but not the other, the values of the non-matching variables will be missing. The input datasets must be sorted by the 'BY' variable.
Copied!
data employeeinfo;
merge company finance;
by name;
run;
1
DATA employeeinfo;
2
MERGE company finance;
3
BY name;
4
RUN;
5
8 Code Block
PROC PRINT
Explanation : This PROC PRINT displays the 'employeeinfo' dataset which is the result of the merge. The use of 'title' and 'title2' allows for two lines of title for the report output.
Copied!
proc print data=employeeinfo;
title 'Laxmi is Cute!';
title2 'very cute';
run;
1
PROC PRINTDATA=employeeinfo;
2
title 'Laxmi is Cute!';
3
title2 'very cute';
4
RUN;
5
9 Code Block
PROC CONTENTS
Explanation : PROC CONTENTS provides detailed metadata about the 'company' dataset. This includes information such as the number of observations, the number of variables, variable names, their types (numeric or character), their lengths, and their formats.
Copied!
proc contents data = company;
run;
1
PROC CONTENTSDATA = company;
2
RUN;
3
10 Code Block
PROC DATASETS
Explanation : PROC DATASETS is a library management procedure that allows various operations on datasets and catalogs. Here, the 'contents data= company;' statement is used to obtain the same metadata as PROC CONTENTS for the 'company' dataset. It is often used in a context where multiple dataset management operations are performed within a single PROC DATASETS execution.
Copied!
proc datasets;
contents data= company;
run;
1
PROC DATASETS;
2
contents DATA= company;
3
RUN;
4
11 Code Block
OPTIONS et PROC PRINT
Explanation : This block demonstrates the use of the global options 'firstobs' and 'obs'. 'options firstobs=3;' instructs SAS to start reading data from the third observation, and 'obs=9;' instructs it to stop after the ninth observation. Therefore, this PROC PRINT will display observations 3 through 9 (inclusive) of the 'employeeinfo' dataset. These options affect all subsequent data steps and procedures until they are reset or changed.
Copied!
options firstobs= 3 obs=9;
proc print data = employeeinfo;
run;
1
options firstobs= 3 obs=9;
2
PROC PRINTDATA = employeeinfo;
3
RUN;
4
12 Code Block
PROC PRINT
Explanation : This PROC PRINT uses a 'WHERE' clause to filter observations from the 'employeeinfo' dataset. Only observations where the 'Name' variable contains the substring 'Vi' (case-insensitive by default in SAS for 'contains' unless the option is changed) will be displayed. This allows targeting specific subsets of data for display.
Copied!
proc print data = employeeinfo;
where Name contains 'Vi';
run;
1
PROC PRINTDATA = employeeinfo;
2
where Name contains 'Vi';
3
RUN;
4
13 Code Block
PROC PRINT
Explanation : This PROC PRINT attempts to generate a report with subtotals. The 'by Name;' statement groups observations by name, and 'sum Gender;' is intended to calculate totals for the 'Gender' variable. However, 'Gender' is a character variable, and the 'SUM' statement in PROC PRINT is designed for numeric variables. This will result in a warning or error in the SAS log, and the sum for 'Gender' will not be displayed. The 'ID Gender;' statement means that the 'Gender' variable will be used as an identification variable in the report, but is not typically used in this context for meaningful grouping with 'BY' and 'SUM' on a categorical variable.
Copied!
proc print data = employeeinfo;
title 'Sub total';
sum Gender;
by Name;
ID Gender;
run;
1
PROC PRINTDATA = employeeinfo;
2
title 'Sub total';
3
sum Gender;
4
BY Name;
5
ID Gender;
6
RUN;
7
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.