Published on :
Data Manipulation CREATION_INTERNE

Example: Merge Data Using a Hash Table

This code is also available in: Español Français
Awaiting validation
A SAS© hash table contains rows (hash entries) and columns (hash variables). Each hash entry must have at least one key column and one data column. Values can be hard-coded or loaded from a SAS© dataset. A hash table resides entirely in memory, which makes its operations fast. Data does not need to be pre-sorted. A hash table is temporary: once the DATA step execution stops, it ceases to exist. However, its content can be saved to a SAS© dataset or an external database. The hash object is dynamically sized. The example uses the DECLARE method to create the hash table, DEFINEKEY to identify the unique key variable (Supplier_ID), and DEFINEDATA to load additional variables (Supplier_Name, Supplier_Address, Country). The FIND method is used to search for matches between the base dataset (product_list) and the hash table.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or SASHELP.

1 Code Block
DATA STEP Data
Explanation :
The SAS code first creates two temporary datasets, 'product_list' and 'supplier', using DATALINES blocks to make the example self-contained. Then, it uses a DATA step to create a new dataset 'supplier_info'. In this DATA step, a hash object 'S' is declared and initialized with data from the 'work.supplier' dataset. 'Supplier_ID' is defined as the hash key, and 'Supplier_Name', 'Supplier_Address', 'Country' are defined as the data to be stored in the hash table. The 'call missing' function is used to initialize these variables and prevent warnings. Next, the program iterates over the 'work.product_list' dataset. For each observation in 'product_list', the 'S.find()' method is called to search for a 'Supplier_ID' match in the hash table. If a match is found (rc=0), the observation is written to the 'supplier_info' dataset, combining information from both sources. Finally, a PROC PRINT is used to display the content of the 'supplier_info' dataset, presenting the products with their complete supplier information.
Copied!
1/* Création du jeu de données product_list */
2DATA work.product_list;
3 INFILE DATALINES dlm=',';
4 LENGTH Product_Id $12 Product_Name $50 Supplier_ID 8;
5 INPUT Product_Id $ Product_Name $ Supplier_ID;
6 DATALINES;
7240200100101,Grandslam Staff Tour Mhl Golf Gloves,3808
8210200100017,Sweatshirt Children's O-Neck,3298
9240400200022,Aftm 95 Vf Long Bg-65 White,1280
10230100100017,Men's Jacket Rem,50
11210200300006,Fleece Cuff Pant Kid'S,1303
12210200500002,Children's Mitten,772
13210200700016,Strap Pants BBO,798
14210201000050,Kid Children's T-Shirt,2963
15210200100009,Kids Sweat Round Neck,Large Logo,3298
16210201000067,Logo Coord.Children's Sweatshirt,2963
17220100100019,Fit Racing Cap,1303
18220100100025,Knit Hat,1303
19220100300001,Fleece Jacket Compass,772
20220200200036,Soft Astro Men's Running Shoes,1747
21230100100015,Men's Jacket Caians,50
22230100500004,Backpack Flag, 6,5x9 Cm.,316
23210200500006,Rain Suit, Plain w/backpack Jacket,772
24230100500006,Collapsible Water Can,316
25224040020000,Bat 5-Ply,3808
26220200200035,Soft Alta Plus Women's Indoor Shoes,1747
27240400200066,Memhis 350,Yellow Medium, 6-pack,1280
28240200100081,Extreme Distance 90 3-pack,3808
29;
30run;
31 
32/* Création du jeu de données supplier */
33data work.supplier;
34 infile datalines dlm=',';
35 length Supplier_ID 8 Supplier_Name $40 Supplier_Address $45 Country $2;
36 input Supplier_ID Supplier_Name $ Supplier_Address $ Country $;
37 datalines;
3850,Scandinavian Clothing A/S,Kr. Augusts Gate 13,NO
39316,Prime Sports Ltd,9 Carlisle Place,GB
40755,Top Sports,Jernbanegade 45,DK
41772,AllSeasons Outdoor Clothing,553 Cliffview Dr,US
42798,Sportico,C. Barquillo 1,ES
431280,British Sports Ltd,85 Station Street,GB
441303,Eclipse Inc,1218 Carriole Ct,US
451684,Magnifico Sports,Rua Costa Pinto 2,PT
461747,Pro Sportswear Inc,2434 Edgebrook Dr,US
473298,A Team Sports,2687 Julie Ann Ct,US
483808,Carolina Sports,3860 Grand Ave,US
49;
50run;
51 
52data supplier_info;
53 drop rc;
54 length Supplier_Name $40 Supplier_Address $ 45 Country $ 2;
55 if _N_=1 then do;
56 declare hash S(dataset:'work.supplier');
57 S.definekey('Supplier_ID');
58 S.definedata('Supplier_Name',
59 'Supplier_Address','Country');
60 S.definedone();
61 call missing(Supplier_Name,
62 Supplier_Address,Country);
63 end;
64 set work.product_list;
65 rc=S.find();
66run;
67 
68proc print data=supplier_info;
69 var Product_ID Supplier_ID Supplier_Name
70 Supplier_Address Country;
71 title "Product Information";
72RUN;
73title;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.