dataSciencePilot

featureMachine

Description

The featureMachine action is an automated feature transformation and generation engine. It provides a comprehensive capability for automating data science workflows, specifically focusing on automatic machine learning pipeline exploration, execution, and ranking. It analyzes the input data to screen variables, impute missing values, treat outliers, and generate new features using techniques such as polynomials, interactions, and entropy-based groupings. The action produces output tables containing the transformation recipes and the transformed data, and can also save the feature engineering model as an analytic store.

Settings
ParameterDescription
tableSpecifies the input CAS table settings, including name, caslib, and filtering options.
targetSpecifies the name of the target variable for the analysis.
featureOutSpecifies the output CAS table that will contain the feature transformation pipelines.
transformationOutSpecifies the output CAS table that will contain the transformed feature data.
saveStateSpecifies the output CAS table to store the model as an analytic store (ASTORE) for future scoring.
casoutSpecifies the CAS table to store the results and metadata of the analysis.
screenPolicySpecifies the policy for screening variables, such as handling constant variables, leakage, and missing values.
transformationPolicySpecifies the transformation techniques to apply, including polynomial generation, interaction detection, and outlier treatment.
explorationPolicySpecifies the policies for automatic variable analysis and grouping (AVAPT), defining thresholds for cardinality, entropy, skewness, etc.
rankPolicySpecifies the policy for ranking features, including which statistics to use and the number of features to keep.
inputsSpecifies the specific variables to be used in the analysis.
misraGriesSpecifies whether to use the Misra-Gries algorithm for frequency estimation when distinct count limits are reached.
copyVarsSpecifies the variables to copy directly to the output tables.
distinctCountLimitSpecifies the limit for the distinct count of values.
eventSpecifies the target event level for classification problems.
seedSpecifies the random number seed for reproducibility.
Data Preparation View data prep sheet
Load Data

Load the HMEQ dataset for use in the examples.

Copied!
1 
2PROC CAS;
3 
4SESSION mysess;
5LOADACTIONSET "dataSciencePilot";
6upload path="hmeq.csv" casout={name="hmeq", replace=true};
7 
8RUN;
9 

Examples

Perform feature generation on the HMEQ table with default settings, outputting the transformation recipe and the transformed data.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe", replace=true} transformationOut={name="feat_data", replace=true};
4 
5RUN;
6 
Result :
Generates a feature recipe table 'feat_recipe' and a table 'feat_data' containing the original and transformed features.

Execute feature machine with specific policies: enable interaction and polynomial generation, screen variables with >50% missing values, and save the model to an analytic store.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe_adv", replace=true} transformationOut={name="feat_data_adv", replace=true} saveState={name="feat_model_astore", replace=true} transformationPolicy={interaction=true, polynomial=true, missing=true} screenPolicy={missingPercentThreshold=50} rankPolicy={topKSave=20};
4 
5RUN;
6 
Result :
Produces an ASTORE 'feat_model_astore' for scoring, along with the recipe and data tables, applying aggressive screening and advanced transformations.

FAQ

What is the primary function of the featureMachine action?
Which parameters are mandatory to run the featureMachine action?
What does the explorationPolicy parameter control?
How can I configure variable screening within the featureMachine action?
What is the purpose of the transformationPolicy parameter?
How does the action handle distinct counts that exceed the limit?
Can the feature transformation model be saved for later use?
What statistics are available for ranking interval variables?