dataSciencePilot

featureMachine

Description

The featureMachine action is an automated feature transformation and generation engine. It provides a comprehensive capability for automating data science workflows, specifically focusing on automatic machine learning pipeline exploration, execution, and ranking. It analyzes the input data to screen variables, impute missing values, treat outliers, and generate new features using techniques such as polynomials, interactions, and entropy-based groupings. The action produces output tables containing the transformation recipes and the transformed data, and can also save the feature engineering model as an analytic store.

Settings
ParameterDescription
table Specifies the input CAS table settings, including name, caslib, and filtering options.
target Specifies the name of the target variable for the analysis.
featureOut Specifies the output CAS table that will contain the feature transformation pipelines.
transformationOut Specifies the output CAS table that will contain the transformed feature data.
saveState Specifies the output CAS table to store the model as an analytic store (ASTORE) for future scoring.
casout Specifies the CAS table to store the results and metadata of the analysis.
screenPolicy Specifies the policy for screening variables, such as handling constant variables, leakage, and missing values.
transformationPolicy Specifies the transformation techniques to apply, including polynomial generation, interaction detection, and outlier treatment.
explorationPolicy Specifies the policies for automatic variable analysis and grouping (AVAPT), defining thresholds for cardinality, entropy, skewness, etc.
rankPolicy Specifies the policy for ranking features, including which statistics to use and the number of features to keep.
inputs Specifies the specific variables to be used in the analysis.
misraGries Specifies whether to use the Misra-Gries algorithm for frequency estimation when distinct count limits are reached.
copyVars Specifies the variables to copy directly to the output tables.
distinctCountLimit Specifies the limit for the distinct count of values.
event Specifies the target event level for classification problems.
seed Specifies the random number seed for reproducibility.
Data Preparation View data prep sheet
Load Data

Load the HMEQ dataset for use in the examples.

Copied!
1 
2PROC CAS;
3 
4SESSION mysess;
5LOADACTIONSET "dataSciencePilot";
6upload path="hmeq.csv" casout={name="hmeq", replace=true};
7 
8RUN;
9 

Examples

Perform feature generation on the HMEQ table with default settings, outputting the transformation recipe and the transformed data.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe", replace=true} transformationOut={name="feat_data", replace=true};
4 
5RUN;
6 
Result :
Generates a feature recipe table 'feat_recipe' and a table 'feat_data' containing the original and transformed features.

Execute feature machine with specific policies: enable interaction and polynomial generation, screen variables with >50% missing values, and save the model to an analytic store.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe_adv", replace=true} transformationOut={name="feat_data_adv", replace=true} saveState={name="feat_model_astore", replace=true} transformationPolicy={interaction=true, polynomial=true, missing=true} screenPolicy={missingPercentThreshold=50} rankPolicy={topKSave=20};
4 
5RUN;
6 
Result :
Produces an ASTORE 'feat_model_astore' for scoring, along with the recipe and data tables, applying aggressive screening and advanced transformations.

FAQ

What is the primary function of the featureMachine action?
Which parameters are mandatory to run the featureMachine action?
What does the explorationPolicy parameter control?
How can I configure variable screening within the featureMachine action?
What is the purpose of the transformationPolicy parameter?
How does the action handle distinct counts that exceed the limit?
Can the feature transformation model be saved for later use?
What statistics are available for ranking interval variables?