The featureMachine action is an automated feature transformation and generation engine. It provides a comprehensive capability for automating data science workflows, specifically focusing on automatic machine learning pipeline exploration, execution, and ranking. It analyzes the input data to screen variables, impute missing values, treat outliers, and generate new features using techniques such as polynomials, interactions, and entropy-based groupings. The action produces output tables containing the transformation recipes and the transformed data, and can also save the feature engineering model as an analytic store.
| Parameter | Description |
|---|---|
| table | Specifies the input CAS table settings, including name, caslib, and filtering options. |
| target | Specifies the name of the target variable for the analysis. |
| featureOut | Specifies the output CAS table that will contain the feature transformation pipelines. |
| transformationOut | Specifies the output CAS table that will contain the transformed feature data. |
| saveState | Specifies the output CAS table to store the model as an analytic store (ASTORE) for future scoring. |
| casout | Specifies the CAS table to store the results and metadata of the analysis. |
| screenPolicy | Specifies the policy for screening variables, such as handling constant variables, leakage, and missing values. |
| transformationPolicy | Specifies the transformation techniques to apply, including polynomial generation, interaction detection, and outlier treatment. |
| explorationPolicy | Specifies the policies for automatic variable analysis and grouping (AVAPT), defining thresholds for cardinality, entropy, skewness, etc. |
| rankPolicy | Specifies the policy for ranking features, including which statistics to use and the number of features to keep. |
| inputs | Specifies the specific variables to be used in the analysis. |
| misraGries | Specifies whether to use the Misra-Gries algorithm for frequency estimation when distinct count limits are reached. |
| copyVars | Specifies the variables to copy directly to the output tables. |
| distinctCountLimit | Specifies the limit for the distinct count of values. |
| event | Specifies the target event level for classification problems. |
| seed | Specifies the random number seed for reproducibility. |
Load the HMEQ dataset for use in the examples.
| 1 | |
| 2 | PROC CAS; |
| 3 | |
| 4 | SESSION mysess; |
| 5 | LOADACTIONSET "dataSciencePilot"; |
| 6 | upload path="hmeq.csv" casout={name="hmeq", replace=true}; |
| 7 | |
| 8 | RUN; |
| 9 |
Perform feature generation on the HMEQ table with default settings, outputting the transformation recipe and the transformed data.
| 1 | |
| 2 | PROC CAS; |
| 3 | dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe", replace=true} transformationOut={name="feat_data", replace=true}; |
| 4 | |
| 5 | RUN; |
| 6 |
Execute feature machine with specific policies: enable interaction and polynomial generation, screen variables with >50% missing values, and save the model to an analytic store.
| 1 | |
| 2 | PROC CAS; |
| 3 | dataSciencePilot.featureMachine / TABLE={name="hmeq"} target="BAD" featureOut={name="feat_recipe_adv", replace=true} transformationOut={name="feat_data_adv", replace=true} saveState={name="feat_model_astore", replace=true} transformationPolicy={interaction=true, polynomial=true, missing=true} screenPolicy={missingPercentThreshold=50} rankPolicy={topKSave=20}; |
| 4 | |
| 5 | RUN; |
| 6 |