The crossValidate action performs cross-validation with specified machine learning actions. It divides the input data into a specified number of folds, trains a model on each fold, and assesses the model's performance. This is crucial for evaluating the generalization ability of a model and preventing overfitting.
| Parameter | Description |
|---|---|
| casOut | Specifies the score output table name and details. |
| kFolds | Specifies the number of folds to use for cross validation. Default: 5. |
| logLevel | Specifies the level of log messages to be written: no logs (0), initialization and completion logs (1), setup summary logs added (2), fold begin and complete logs added (3). Default: 3. |
| modelType | Specifies the model type to which cross validation is applied. Supported types include BNET, DECISIONTREE, FACTMAC, FOREST, GRADBOOST, NEURALNET, and SVM. Default: DECISIONTREE. |
| nSubsessionWorkers | Specifies the number of worker nodes for each subsession to use for parallel fold evaluation. Default: 0. |
| parallelFolds | When set to True, evaluates folds in parallel. Default: TRUE. |
| seed | Specifies the seed to use for fold sampling for cross validation. Default: 0. |
| targetEvent | Specifies the name of the nominal target event to use for model assessment. |
| trainOptions | Specifies a list of parameters for the model training action to use in the cross validation process. This is a required parameter. |
A retail bank wants to estimate the generalization error of a Gradient Boosting model designed to predict customer churn. They need to ensure the model performs consistently acr...
A credit card processor needs to validate a Random Forest model for fraud detection on a large dataset. Due to the data volume and the complexity of the forest, they want to uti...
A medical research facility is testing a Support Vector Machine (SVM) classifier for a rare disease. They need to validate the model using a low number of folds due to small sam...