Published on :
ETL CREATION_INTERNE

Format Management and Data Transformation

This code is also available in: Deutsch Español Français
Awaiting validation
The script is divided into two main parts. The first defines a numeric format ('readdate') to interpret various date representations. This format is then applied in a DATA step to convert raw date character strings into usable numeric SAS© dates. A system option 'yearcutoff' is also defined to handle the ambiguity of two-digit years. A PROC PRINT displays the result of this transformation. The second part defines another numeric format ('typefmt') and several character input formats (INVALUE : '$groupx', '$groupy', '$groupz') for response categorization. These formats are used in a second DATA step to map raw text responses to standardized categorical terms.
Data Analysis

Type : CREATION_INTERNE


All data processed in this script is created internally via 'DATALINES' blocks directly included in the SAS code. No external data sources or SASHELP libraries are used for data input.

1 Code Block
PROC FORMAT
Explanation :
Defines a numeric format named 'readdate'. This format maps value 1 to the 'date7.' read format (day-month-two-digit-year) and value 2 to the 'mmddyy8.' format (month/day/two-digit-year). It is used to dynamically interpret character strings as SAS dates.
Copied!
1PROC FORMAT;
2 value readdate 1='date7.'
3 2='mmddyy8.';
4 RUN;
2 Code Block
OPTIONS
Explanation :
Sets the 'yearcutoff' system option to 1920. This option indicates that any two-digit year less than 'yearcutoff' (e.g., '00' to '19') will be considered to belong to the 21st century (20xx), while years equal to or greater (e.g., '20' to '99') will be considered to belong to the 20th century (19xx).
Copied!
1options yearcutoff=1920;
3 Code Block
DATA STEP Data
Explanation :
Creates the 'fixdates' dataset by reading raw in-line data. The 'dateinformat' variable is created by applying the 'readdate' format to the 'source' variable, which generates the appropriate date format string. Then, the 'INPUTN' function uses this dynamic format string to convert the 'start' character string into a numeric SAS date value ('newdate').
Copied!
1DATA fixdates;
2 LENGTH jobdesc $12;
3 INPUT SOURCE id lname $ jobdesc $ start $;
4 dateinformat=put(SOURCE, readdate.);
5 newdate = inputn(start, dateinformat);
6 DATALINES;
7 1 1604 Ziminski writer 09aug90
8 1 2010 Clavell editor 26jan95
9 2 1833 Rivera writer 10/25/92
10 2 2222 Barnes proofreader 3/26/98
11 ;
4 Code Block
PROC PRINT
Explanation :
Displays the content of the newly created 'fixdates' dataset, allowing verification of date conversion and data structure.
Copied!
1PROC PRINT;RUN;
5 Code Block
PROC FORMAT
Explanation :
Defines a numeric format 'typefmt' that maps values 1, 2, and 3 to specific character input format names ('$groupx', '$groupy', '$groupz'). In addition, it defines three character input formats ('INVALUE'): '$groupx', '$groupy', and '$groupz'. Each of these formats maps the strings 'positive', 'negative', and 'neutral' to different categorical terms (e.g., 'agree', 'accept', 'pass').
Copied!
1PROC FORMAT;
2 value typefmt 1='$groupx'
3 2='$groupy'
4 3='$groupz';
5 invalue $groupx 'positive'='agree'
6 'negative'='disagree'
7 'neutral'='notsure';
8 invalue $groupy 'positive'='accept'
9 'negative'='reject'
10 'neutral'='possible';
11 
12 invalue $groupz 'positive'='pass'
13 'negative'='fail'
14 'neutral'='retest';
15 RUN;
6 Code Block
DATA STEP Data
Explanation :
Creates the 'answers' dataset by reading raw in-line data. The 'respinformat' variable is created by applying the 'typefmt' format to the 'type' variable, which generates the appropriate character input format name (e.g., '$groupx'). Then, the 'INPUTC' function uses this dynamic format name to map the 'response' character string to a standardized categorical value ('word').
Copied!
1DATA answers;
2 INPUT type response $;
3 respinformat = put(type, typefmt.);
4 word = inputc(response, respinformat);
5 DATALINES;
6 1 positive
7 1 negative
8 1 neutral
9 2 positive
10 2 negative
11 2 neutral
12 3 positive
13 3 negative
14 3 neutral
15 ;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.