ETL CAS

Feature Engineering in CAS: How to Create Custom Categories from Continuous Data

This code is also available in: Deutsch Español
Difficulty Level
Beginner
Published on :
Michael

Expert Advice

Michael
Responsable de l'infrastructure Viya.

Writing out every integer in an IN list (e.g., if Age in (18, 19, 20... 25)) is error-prone and hard to read. A cleaner, more robust approach is to use compound comparison operators: if 18 <= Age <= 25. Or, better yet, use User-Defined Formats (PROC FORMAT). By applying a format, you can bin data for display without actually creating new hard-coded text variables, saving memory and keeping your DATA step code much shorter.

Replace explicit lists with range logic to create cleaner, faster code.
This example illustrates how to manipulate and enrich data on a CAS server using a DATA step. It covers the creation of new variables based on conditions (age groups and FICO scores) and the assignment of credit qualifications, thus transforming an initial 'creditscores' dataset into a new 'creditqualify' dataset with derived attributes. The process concludes by visualizing the result using the CAS action 'table.fetch'.
Data Analysis

Type : CREATION_INTERNE


The examples use generated data (datalines) for the 'creditscores' table which is then processed to create the 'CreditQualify' table.

1 Code Block
DATA STEP Data
Explanation :
The first DATA step creates an example 'creditscores' dataset on the CAS server with 'Age' and 'Credit_Score' variables. The second DATA step begins with the DATA statement which initializes a new DATA step and creates a CAS table named 'CreditQualify' in the 'Mycaslib' library of the 'MYSESSION' CAS session. The SET statement reads observations from the 'creditscores' table. The LENGTH statement defines the new character variables 'Age_Range', 'FICO_Rating', and 'Credit_Qualification' with specified lengths. The IF-THEN/ELSE statement assigns values to 'Age_Range' based on age. Similar IF-THEN/ELSE blocks are used to assign FICO ratings ('FICO_Rating') and credit qualifications ('Credit_Qualification') based on 'Credit_Score'.
Copied!
1DATA mycaslib.creditscores / sessref=mysession;
2 INFILE DATALINES;
3 INPUT Age Credit_Score;
4 DATALINES;
520 500
630 600
740 700
850 780
960 810
1070 400
1119 550
1233 690
1348 710
1462 820
15;
16RUN;
17 
18DATA mycaslib.creditqualify / sessref=mysession;
19 SET mycaslib.creditscores;
20 LENGTH Age_Range $8;
21 IF Age in (18, 19, 20, 21, 22, 23, 24, 25) THEN Age_Range="18-25";
22 ELSE IF Age in (26, 27, 28, 29, 30, 31, 32, 33, 34, 35) THEN Age_Range="26-35";
23 ELSE IF Age in (36, 37, 38, 39, 40, 41, 42, 43, 44, 45) THEN Age_Range="36-45";
24 ELSE IF Age in (46, 47, 48, 49, 50, 51, 52, 53, 54, 55) THEN Age_Range="46-55";
25 ELSE IF Age in (56, 57, 58, 59, 60, 61, 62, 63, 64) THEN Age_Range="56-64";
26 ELSE IF Age>=65 THEN Age_Range="65+";
27 LENGTH FICO_Rating $12;
28 IF 300<=Credit_Score<=570 THEN FICO_Rating="Very Poor";
29 ELSE IF 580<=Credit_Score<=669 THEN FICO_Rating="Fair";
30 ELSE IF 670<=Credit_Score<=739 THEN FICO_Rating="Good";
31 ELSE IF 740<=Credit_Score<=799 THEN FICO_Rating="Very Good";
32 ELSE IF Credit_Score>=800 THEN FICO_Rating="Exceptional";
33 LENGTH Credit_Qualification $12;
34 IF Credit_Score>=740 THEN Credit_Qualification="Platinum";
35 ELSE IF 650<=Credit_Score<=739 THEN Credit_Qualification="Gold";
36 ELSE IF 450<=Credit_Score<=649 THEN Credit_Qualification="Secured Card";
37 ELSE IF Credit_Score<=449 THEN Credit_Qualification="N/A";
38RUN;
2 Code Block
PROC CAS
Explanation :
This CAS procedure uses the 'table.fetch' action to retrieve and display a portion of the content of the newly created 'creditqualify' table, allowing verification of the generated variables and values.
Copied!
1PROC CAS;
2 TABLE.fetch/TABLE="creditqualify"
3 index=false;
4QUIT;
Pro Tip
The sessref=mysession option ensures this DATA step runs on the CAS Server, taking advantage of distributed parallel processing. This is critical for performance when applying these transformations to millions of rows.
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.