Published on :
Data Manipulation CREATION_INTERNE

Examples: Data Update

This code is also available in: Deutsch Español Français
Awaiting validation
The UPDATE statement is a powerful tool for dataset maintenance. It requires a BY statement and that the input datasets be sorted or indexed by the BY variables. The flexibility of the UPDATE statement allows adding, deleting, or renaming variables in the output dataset. By default, missing values in the transaction dataset do not replace existing values in the master dataset, but this behavior can be changed with the UPDATEMODE=NOMISSINGCHECK option. It is crucial to ensure that BY variable values are unique in the master dataset, as duplicates can lead to partial updates.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or SASHELP.

1 Code Block
DATA STEP Data
Explanation :
This example illustrates updating a master dataset ('master') with new values from a transaction dataset ('plantNew') using the UPDATE statement and the 'common' variable as the BY variable. The value 'Eggplant' for 'plant' in line 5 of 'master' is replaced by 'Escarole' from 'plantNew'.
Copied!
1DATA master;
2 INPUT common $ animal $ plant $;
3 DATALINES;
4a Ant Apple
5b Bird Banana
6c Cat Coconut
7d Dog Dewberry
8e Eagle Eggplant
9f Frog Fig
10;
11RUN;
12 
13DATA plantNew;
14 INPUT common $ plant $;
15 DATALINES;
16a Apricot
17b Barley
18c Cactus
19d Date
20e Escarole
21f Fennel
22;
23RUN;
24 
25DATA master2;
26 update master plantNew;
27 BY common;
28RUN;
29PROC PRINT DATA=master2; RUN;
2 Code Block
DATA STEP Data
Explanation :
This example demonstrates the behavior of the UPDATE statement when the transaction dataset ('plantNewDupes') contains duplicate values for the BY variable ('common'). The value 'Dewberry' is replaced by 'Dill' because 'Dill' is the last value for 'plant' associated with 'd' in the transaction dataset. It is important to note that if the master dataset contains duplicates for the BY variable, only the first occurrence is updated, and a warning is issued in the SAS log.
Copied!
1DATA master;
2 INPUT common $ animal $ plant $;
3 DATALINES;
4a Ant Apple
5b Bird Banana
6c Cat Coconut
7d Dog Dewberry
8e Eagle Eggplant
9f Frog Fig
10;
11RUN;
12 
13DATA plantNewDupes;
14 INPUT common $ plant $;
15 DATALINES;
16a Apricot
17b Barley
18c Cactus
19d Date
20d Dill
21e Escarole
22f Fennel
23;
24RUN;
25 
26DATA master;
27 update master plantNewDupes;
28 BY common;
29RUN;
30PROC PRINT DATA=master; RUN;
3 Code Block
DATA STEP Data
Explanation :
This first code illustrates updating a master dataset ('master') with a transaction dataset ('minerals') that introduces a new variable ('mineral'), missing values, and non-matching BY variable values. The 'mineral' variable is added. Observations from the transaction dataset without a match in the master are added as new observations. Missing values in the transaction dataset do not replace existing values in the master dataset by default.
Copied!
1DATA master;
2 INPUT common $ animal $ plant $;
3 DATALINES;
4a Ant .
5c Cat Coconut
6d Dog Dewberry
7e Eagle Eggplant
8f Frog Fig
9;
10RUN;
11 
12DATA minerals;
13 INPUT common $ plant $ mineral $;
14 DATALINES;
15a Apricot Amethyst
16b Barley Beryl
17c Cactus .
18e . .
19f Fennel .
20g Grape Garnet
21;
22RUN;
23 
24DATA master;
25 update master minerals;
26 BY common;
27RUN;
28 
29PROC PRINT DATA=master; RUN;
4 Code Block
DATA STEP
Explanation :
This second code, based on the datasets from example 3.1, shows how the UPDATEMODE=NOMISSINGCHECK option allows missing values in the transaction dataset to replace corresponding values in the master dataset. Here, the 'plant' value in observation 5 will be set to missing because it is missing in the 'minerals' transaction dataset and the option is enabled.
Copied!
1DATA master;
2 update master minerals updatemode=nomissingcheck;
3 BY common;
4RUN;
5PROC PRINT DATA=master;
6 title "Updated Data Set master";
7 title2 "With Values Updated to Missing";
8RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Banner
Expert Advice
Expert
Michael
Responsable de l'infrastructure Viya.
« The UPDATE statement is a specialized transaction-processing tool in SAS, designed to merge a "Master" dataset with a "Transaction" dataset. Unlike the MERGE statement, which simply overwrites or combines columns, UPDATE is built with specific business logic for data persistence, ensuring that your master records are refreshed only with relevant, non-missing information. »