Detailed functional analysis of different data preparation methods. Includes techniques for examining data (PROC CONTENTS, PROC PRINT), finding common variables (PROC SQL), creating unique BY values in case of duplicates (PROC FREQ, CATX function), managing data type differences (INPUT function), adjusting variable lengths (LENGTH or ATTRIB statement), and standardizing formats and labels (ATTRIB statement). The examples aim to ensure compatibility of common variable attributes for successful data combination operations.
Data Analysis
Type : CREATION_INTERNE
Examples use generated data (datalines) or SASHELP, as well as predefined datasets in SAS documentation.
1 Code Block
PROC CONTENTS / PROC SORT / PROC PRINT
Explanation : This example uses the PROC CONTENTS, PROC SORT, and PROC PRINT procedures to examine descriptive information, sort, and print the 'Inventory', 'Sales', and 'Sales2019' datasets before combining them. The objective is to verify the compatibility of common variables and identify relationships between datasets.
PROC SORTDATA=inventory; BY PartNumber; RUN; /* 2*/
6
PROC SORTDATA=Sales; BY PartNumber; RUN;
7
PROC SORTDATA=Sales2019; BY PartNumber; RUN;
8
9
PROC PRINTDATA=inventory; RUN/* 3*/;
10
PROC PRINTDATA=Sales; RUN;
11
PROC PRINTDATA=Sales2019; RUN;
2 Code Block
PROC SQL
Explanation : This example uses PROC SQL to find variables common to the 'Inventory', 'Sales', and 'Sales2019' datasets. It creates a temporary table 'commonvars' from 'dictionary.columns' and then selects the variable names that appear in all specified datasets.
Copied!
proc sql;
title "Variables Common to Inventory, Sales, and Sales2019";
create table commonvars as /* 1 */
select memname, upcase(name) as name
from dictionary.columns
where libname='WORK' and
memname in ('INVENTORY', 'SALES', 'SALES2019');
select name /* 2 */
from commonvars
group by name
having count(*)=(select count(distinct(memname)) from commonvars);
quit;
1
PROC SQL;
2
title "Variables Common to Inventory, Sales, and Sales2019";
3
create TABLE commonvars as /* 1 */
4
select memname, upcase(name) as name
5
from dictionary.columns
6
where LIBNAME='WORK' and
7
memname in ('INVENTORY', 'SALES', 'SALES2019');
8
select name /* 2 */
9
from commonvars
10
group BY name
11
having count(*)=(select count(distinct(memname)) from commonvars);
12
QUIT;
3 Code Block
PROC SORT / PROC FREQ / DATA STEP
Explanation : This example shows how to handle duplicate values in a BY variable. It uses PROC FREQ to identify duplicates of 'partNumber' in the 'Sales' dataset, then a DATA step with the CATX function and the automatic variable _N_ to create a unique identifier ('uniqueID') for each observation, including duplicates.
Copied!
proc sort data=sales; /* 1 */
by partNumber;
run;
proc print data=sales; run;
proc freq data = Sales noprint; /* 2 */
tables partNumber / out = SalesDupes
(keep = partNumber Count
where = (Count > 1));
run;
proc print data=SalesDupes;run;
data SalesUnique; /* 3 */
set Sales;
uniqueID = catx('.',partNumber,_n_);
run;
proc print data=SalesUnique; /* 4 */
var uniqueID partNumber partName salesPerson;
run;
1
PROC SORTDATA=sales; /* 1 */
2
BY partNumber;
3
RUN;
4
PROC PRINTDATA=sales; RUN;
5
6
PROC FREQDATA = Sales noprint; /* 2 */
7
tables partNumber / out = SalesDupes
8
(keep = partNumber Count
9
where = (Count > 1));
10
RUN;
11
12
PROC PRINTDATA=SalesDupes;RUN;
13
14
DATA SalesUnique; /* 3 */
15
SET Sales;
16
uniqueID = catx('.',partNumber,_n_);
17
RUN;
18
19
PROC PRINTDATA=SalesUnique; /* 4 */
20
var uniqueID partNumber partName salesPerson;
21
RUN;
4 Code Block
DATA STEP / PROC CONTENTS
Explanation : This example illustrates an incompatible data type issue when merging datasets. Initially, a merge attempt fails because the 'Weight' variable is of type CHAR in 'CarsSmall' and NUMERIC in 'Cars'. To resolve this, the example shows how to convert the 'Weight' variable from 'CarsSmall' to numeric using the INPUT function and rename it, before proceeding with the merge.
Copied!
data cars; /* 1 */
set sashelp.cars;
run;
proc contents data=cars; run; /* 2 */
proc contents data=CarsSmall; run;
data combineCars;
merge cars CarsSmallNum; /* 3*/
by make model;
keep Make DriveTrain Model MakeModelDrive Weight;
run;
Explanation : Correction of the previous example: This code resolves the data type incompatibility problem by creating a new dataset 'CarsSmallNum' where the 'Weight' variable is converted to numeric using the INPUT function. The original variable is dropped and the new one is renamed. Then, the datasets are sorted and successfully merged. PROC CONTENTS and PROC PRINT are used to verify the result.
Copied!
data CarsSmallNum; /* 1 */
set CarsSmall;
weightNum=input(weight, 8.);
drop weight;
rename WeightNum=weight;
run;
proc sort data=cars; by make model; run; /* 2 */
proc sort data=CarsSmallNum; by Make Model; run;
data combineCars;
merge cars CarsSmallNum; /* 3 */
by make model;
keep Make DriveTrain Model MakeModelDrive Weight;
run;
proc contents data=combineCars; run; /* 4*/
proc print data=combineCars; run;
1
DATA CarsSmallNum; /* 1 */
2
SET CarsSmall;
3
weightNum=INPUT(weight, 8.);
4
drop weight;
5
rename WeightNum=weight;
6
RUN;
7
8
PROC SORTDATA=cars; BY make model; RUN; /* 2 */
9
PROC SORTDATA=CarsSmallNum; BY Make Model; RUN;
10
11
DATA combineCars;
12
MERGE cars CarsSmallNum; /* 3 */
13
BY make model;
14
keep Make DriveTrain Model MakeModelDrive Weight;
15
RUN;
16
PROC CONTENTSDATA=combineCars; RUN; /* 4*/
17
PROC PRINTDATA=combineCars; RUN;
6 Code Block
PROC PRINT / DATA STEP / PROC CONTENTS
Explanation : This example demonstrates how to handle different lengths for a common variable ('Mileage') when merging datasets. It shows that SAS by default uses the length of the first table listed in the MERGE statement. To explicitly control the length, the LENGTH statement must be used before the MERGE statement in the DATA step.
Explanation : Correction of the previous example: This version shows how to use the ATTRIB statement to explicitly specify the length of the 'Mileage' variable to 6 bytes before merging the datasets, thus ensuring that the desired length is applied in the output dataset.
Copied!
data yearly;
merge quarter1 quarter2 quarter3 quarter4;
by Account;
attrib Mileage
length = 6;
run;
Explanation : This example handles differences in formats and labels for common variables ('Height' and 'Weight') when merging 'class' and 'classfit' datasets. It shows how the ATTRIB statement can be used in a DATA step to explicitly define the label and format of variables in the output dataset, overriding SAS's default behavior of taking attributes from the first dataset listed.
Explanation : This example addresses the case where variables have the same name but represent different data (and are therefore not truly common variables). The solution is to use the RENAME= dataset option in the DATA statement to rename the 'Weight' variable to 'weightLBS' in the 'vehicles' dataset (from CarsSmall), thus avoiding a conflict or incorrect merge when combining data.
Copied!
data vehicles(rename=(weight=weightLBS));
set vehicles;
run;
1
2
DATA vehicles(rename=(weight=weightLBS));
3
SET vehicles;
4
RUN;
5
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.