Published on :
ETL CREATION_INTERNE

Data Creation and Deduplication with PROC SORT

This code is also available in: Español Français
Awaiting validation
The script begins by creating a temporary dataset named 'amit' using a DATA STEP with embedded data (cards). It contains observations with identifiers (id) and names (name) that may be duplicated. Then, two PROC SORT calls are made. The first uses the NODUP option to remove observations that are exact duplicates across all variables. The second uses the NODUPKEY option to remove observations that have the same value for the variable specified in the BY statement (here, 'id'), while retaining the first occurrence.
Data Analysis

Type : CREATION_INTERNE


The data is created directly within the SAS script using the 'cards;' statement within a DATA STEP. The 'amit' dataset is generated in memory.

1 Code Block
DATA STEP Data
Explanation :
This code block uses a DATA STEP to create a SAS dataset named 'amit'. The variables 'id' (numeric), 'name' (character string), and 'amount' (numeric) are defined. The data is then read directly from the script via the 'cards;' statement.
Copied!
1DATA amit;
2INPUT id name $ amount;
3CARDS;
41 Amit 22
51 Amit 22
61 Amit 23
72 Amit 23
8;
9RUN;
2 Code Block
PROC SORT
Explanation :
This block uses the PROC SORT procedure to sort the 'amit' dataset. The 'nodup' option is specified to remove observations that are exact duplicates across all variables of the observation. The resulting dataset, 'nodup', contains the unique observations. Sorting is performed by the 'id' variable.
Copied!
1PROC SORT DATA=amit nodup out=nodup;
2BY id;
3RUN;
3 Code Block
PROC SORT
Explanation :
This block also uses the PROC SORT procedure. The 'nodupkey' option is used to remove observations with duplicate values for the variable(s) specified in the BY statement (here, 'id'). Unlike 'nodup', 'nodupkey' only considers the sort key for deduplication. The resulting dataset is 'nodupkey'.
Copied!
1 
2PROC SORT
3DATA=amit nodupkey out=nodupkey;
4BY id;
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.