Published on :
Data Manipulation CREATION_INTERNE

Merging Data with a Hash Table

This code is also available in: Deutsch Español Français
Awaiting validation
A SAS© hash table contains rows (hash entries) and columns (hash variables). Each hash entry must have at least one key column and one data column. Values can be hardcoded or loaded from a SAS© dataset. A hash table resides entirely in memory, which makes its operations fast. Data does not need to be pre-sorted. A hash table is temporary: once the DATA step is complete, it ceases to exist. However, its contents can be saved to a SAS© dataset or an external database. The hash object is dynamically sized.
Data Analysis

Type : CREATION_INTERNE


The examples use generated data (DATA step with DATALINES) to simulate the `product_list` and `supplier` tables.

1 Code Block
DATA STEP Data
Explanation :
This DATA step code block creates the `work.product_list` dataset using in-line data (DATALINES). It defines the variables `Product_Id`, `Product_Name`, and `Supplier_ID` with their respective types and lengths. Spaces and special characters in `Product_Name` have been replaced by underscores for better handling via DATALINES.
Copied!
1DATA work.product_list;
2 LENGTH Product_Id $10 Product_Name $40 Supplier_ID 8;
3 INPUT Product_Id $ Product_Name :$40. Supplier_ID;
4 DATALINES;
5240200100101 Grandslam_Staff_Tour_Mhl_Golf_Gloves 3808
6210200100017 Sweatshirt_Children's_O-Neck 3298
7240400200022 Aftm_95_Vf_Long_Bg-65_White 1280
8230100100017 Men's_Jacket_Rem 50
9210200300006 Fleece_Cuff_Pant_Kid'S 1303
10210200500002 Children's_Mitten 772
11210200700016 Strap_Pants_BBO 798
12210201000050 Kid_Children's_T-Shirt 2963
13210200100009 Kids_Sweat_Round_Neck,Large_Logo 3298
14210201000067 Logo_Coord.Children's_Sweatshirt 2963
15220100100019 Fit_Racing_Cap 1303
16220100100025 Knit_Hat 1303
17220100300001 Fleece_Jacket_Compass 772
18220200200036 Soft_Astro_Men's_Running_Shoes 1747
19230100100015 Men's_Jacket_Caians 50
20230100500004 Backpack_Flag,_6,5x9_Cm. 316
21210200500006 Rain_Suit,_Plain_w/backpack_Jacket 772
22230100500006 Collapsible_Water_Can 316
23224040020000 Bat_5-Ply 3808
24220200200035 Soft_Alta_Plus_Women's_Indoor_Shoes 1747
25240400200066 Memhis_350,Yellow_Medium,_6-pack 1280
26240200100081 Extreme_Distance_90_3-pack 3808
27;
28RUN;
2 Code Block
DATA STEP Data
Explanation :
This DATA step code block creates the `work.supplier` dataset using in-line data (DATALINES). It defines the variables `Supplier_ID`, `Supplier_Name`, `Supplier_Address`, and `Country` with their respective types and lengths. Spaces and special characters have been replaced by underscores for better handling via DATALINES.
Copied!
1DATA work.supplier;
2 LENGTH Supplier_ID 8 Supplier_Name $40 Supplier_Address $45 Country $2;
3 INPUT Supplier_ID Supplier_Name :$40. Supplier_Address :$45. Country $;
4 DATALINES;
550 Scandinavian_Clothing_A/S Kr._Augusts_Gate_13 NO
6316 Prime_Sports_Ltd 9_Carlisle_Place GB
7755 Top_Sports Jernbanegade_45 DK
8772 AllSeasons_Outdoor_Clothing 553_Cliffview_Dr US
9798 Sportico C._Barquillo_1 ES
101280 British_Sports_Ltd 85_Station_Street GB
111303 Eclipse_Inc 1218_Carriole_Ct US
121684 Magnifico_Sports Rua_Costa_Pinto_2 PT
131747 Pro_Sportswear_Inc 2434_Edgebrook_Dr US
143298 A_Team_Sports 2687_Julie_Ann_Ct US
153808 Carolina_Sports 3860_Grand_Ave US
16;
17RUN;
3 Code Block
DATA STEP / PROC PRINT
Explanation :
1. Includes a LENGTH statement to ensure that the `Supplier_Name`, `Supplier_Address`, and `Country` variables are defined in the PDV (Program Data Vector).
2. During the first iteration of the DATA step, declares the hash object `S`. Assigns `Supplier_ID` as the key for the hash object. Includes the values of `Supplier_Name`, `Supplier_Address`, and `Country` from the `work.supplier` dataset.
3. Since `Supplier_Name`, `Supplier_Address`, and `Country` are not explicitly assigned initial values, SAS writes a NOTE to the log indicating that the variables are uninitialized. `CALL MISSING` suppresses this NOTE.
4. Reads an observation from the `product_list` dataset.
5. The `FIND` method is called to check if the `Supplier_ID` from `product_list` matches the `Supplier_ID` key of one of the hash object entries. If there is a match (`rc=0`), the observation is written to the `supplier_info` dataset.
PROC PRINT then displays the contents of the `supplier_info` dataset.
Copied!
1DATA supplier_info;
2 drop rc;
3 LENGTH Supplier_Name $40 Supplier_Address $ 45 Country $ 2; /* 1*/
4 IF _N_=1 THEN DO;
5 declare hash S(dataset:'work.supplier'); /* 2*/
6 S.definekey('Supplier_ID');
7 S.definedata('Supplier_Name',
8 'Supplier_Address','Country');
9 S.definedone();
10 call missing(Supplier_Name,
11 Supplier_Address,Country); /* 3*/
12 END;
13 SET work.product_list; /* 4*/
14 rc=S.find(); /* 5*/
15RUN;
16PROC PRINT DATA=supplier_info;
17 var Product_ID Supplier_ID Supplier_Name
18 Supplier_Address Country;
19 title "Product Information";
20RUN;
21title;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved