featureMachine - WeAreCAS

Q: What is the primary function of the featureMachine action?

The featureMachine action serves as an automated feature transformation and generation engine. It is designed to automate data science workflows by exploring, executing, and ranking machine learning pipelines.

Q: Which parameters are mandatory to run the featureMachine action?

To run the action, you must specify the "table" parameter (input data), the "target" parameter (target variable), the "featureOut" parameter (to store feature pipelines), and the "transformationOut" parameter (to store transformation pipelines).

Q: What does the explorationPolicy parameter control?

The explorationPolicy parameter specifies the policy for automatic variable analysis and grouping (AVAPT). It contains sub-parameters to configure analysis based on cardinality, coefficient of variation (cv), entropy, index of qualitative variation (iqv), kurtosis, missing values, nominal variable handling, outliers, and skewness.

Q: How can I configure variable screening within the featureMachine action?

You can use the "screenPolicy" parameter to recommend which variables should be screened out, transformed, or copied. It supports policies for detecting constant variables, grouping rare levels, identifying leakage, and filtering based on low coefficient of variation or high missing rates.

Q: What is the purpose of the transformationPolicy parameter?

The transformationPolicy parameter defines the scope of feature transformations and generations the machine will perform. It allows you to enable or disable specific transformation types such as those for cardinality reduction, entropy, interactions, kurtosis, missing value treatment, outlier treatment, polynomial expansion, and skewness.

Q: How does the action handle distinct counts that exceed the limit?

The "distinctCountLimit" parameter sets a limit (default is 10,000). If this limit is exceeded and the "misraGries" parameter is set to TRUE (which is the default), the action uses the Misra-Gries frequency sketch algorithm to estimate the distribution. Otherwise, the operation may be aborted.

Q: Can the feature transformation model be saved for later use?

Yes, you can use the "saveState" parameter to specify a CAS table where the feature transformation and generation model will be stored.

Q: What statistics are available for ranking interval variables?

In the "rankPolicy", the "intervalStat" parameter offers several options including AVGQUANKURT, AVGQUANSKEW, CLASSICALKURT, CLASSICALSKEW, ENTROPY, MI (Mutual Information), NORMMI (Normalized Mutual Information), PEARSON, and SU (Symmetric Uncertainty).

Description

The featureMachine action is an automated feature transformation and generation engine. It provides a comprehensive capability for automating data science workflows, specifically focusing on automatic machine learning pipeline exploration, execution, and ranking. It analyzes the input data to screen variables, impute missing values, treat outliers, and generate new features using techniques such as polynomials, interactions, and entropy-based groupings. The action produces output tables containing the transformation recipes and the transformed data, and can also save the feature engineering model as an analytic store.

Settings

Parameter	Description
table	Specifies the input CAS table settings, including name, caslib, and filtering options.
target	Specifies the name of the target variable for the analysis.
featureOut	Specifies the output CAS table that will contain the feature transformation pipelines.
transformationOut	Specifies the output CAS table that will contain the transformed feature data.
saveState	Specifies the output CAS table to store the model as an analytic store (ASTORE) for future scoring.
casout	Specifies the CAS table to store the results and metadata of the analysis.
screenPolicy	Specifies the policy for screening variables, such as handling constant variables, leakage, and missing values.
transformationPolicy	Specifies the transformation techniques to apply, including polynomial generation, interaction detection, and outlier treatment.
explorationPolicy	Specifies the policies for automatic variable analysis and grouping (AVAPT), defining thresholds for cardinality, entropy, skewness, etc.
rankPolicy	Specifies the policy for ranking features, including which statistics to use and the number of features to keep.
inputs	Specifies the specific variables to be used in the analysis.
misraGries	Specifies whether to use the Misra-Gries algorithm for frequency estimation when distinct count limits are reached.
copyVars	Specifies the variables to copy directly to the output tables.
distinctCountLimit	Specifies the limit for the distinct count of values.
event	Specifies the target event level for classification problems.
seed	Specifies the random number seed for reproducibility.

Data Preparation View data prep sheet

Load Data

Load the HMEQ dataset for use in the examples.

Copied!

1
2	PROC CAS;
3
4	SESSION mysess;
5	LOADACTIONSET "dataSciencePilot";
6	upload path="hmeq.csv" casout={name="hmeq", replace=true};
7
8	RUN;
9

Examples

Perform feature generation on the HMEQ table with default settings, outputting the transformation recipe and the transformed data.

SAS® / CAS Code Code awaiting community validation

Copied!

1
2	PROC CAS;
3	dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe", replace=true} transformationOut={name="feat_data", replace=true};
4
5	RUN;
6

Result :
Generates a feature recipe table 'feat_recipe' and a table 'feat_data' containing the original and transformed features.

Execute feature machine with specific policies: enable interaction and polynomial generation, screen variables with >50% missing values, and save the model to an analytic store.

SAS® / CAS Code Code awaiting community validation

Copied!

1
2	PROC CAS;
3	dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe_adv", replace=true} transformationOut={name="feat_data_adv", replace=true} saveState={name="feat_model_astore", replace=true} transformationPolicy={interaction=true, polynomial=true, missing=true} screenPolicy={missingPercentThreshold=50} rankPolicy={topKSave=20};
4
5	RUN;
6

Result :
Produces an ASTORE 'feat_model_astore' for scoring, along with the recipe and data tables, applying aggressive screening and advanced transformations.

FAQ

What is the primary function of the featureMachine action?

Which parameters are mandatory to run the featureMachine action?

What does the explorationPolicy parameter control?

How can I configure variable screening within the featureMachine action?

What is the purpose of the transformationPolicy parameter?

How does the action handle distinct counts that exceed the limit?

Can the feature transformation model be saved for later use?

What statistics are available for ranking interval variables?

Actions associées

dataSciencePilot

analyzeMissingPatterns

The analyzeMissingPatterns action performs a missing pattern analysis. It is ...

dataSciencePilot

exploreCorrelation

The exploreCorrelation action explores linear and nonlinear correlations amon...

dataSciencePilot

exploreData

The exploreData action performs data exploration, automatic variable analysis...

dataSciencePilot

generateShadowFeatures

Generate shadow features.

Table of Contents

Description

Load Data

Examples

Basic Feature Generation

Advanced Feature Engineering with Policies

FAQ

Actions associées

analyzeMissingPatterns

exploreCorrelation

exploreData

generateShadowFeatures