mlTools

crossValidate

Description

The crossValidate action performs cross-validation with specified machine learning actions. It divides the input data into a specified number of folds, trains a model on each fold, and assesses the model's performance. This is crucial for evaluating the generalization ability of a model and preventing overfitting.

mlTools.crossValidate { casOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, kFolds=integer, logLevel=integer, modelType="BNET"|"DECISIONTREE"|"FACTMAC"|"FOREST"|"GRADBOOST"|"NEURALNET"|"SVM", nSubsessionWorkers=integer, parallelFolds=TRUE|FALSE, seed=integer, targetEvent="string", trainOptions={<key-1>=<any-list-or-data-type-1>, ...} };
Settings
ParameterDescription
casOutSpecifies the score output table name and details.
kFoldsSpecifies the number of folds to use for cross validation. Default: 5.
logLevelSpecifies the level of log messages to be written: no logs (0), initialization and completion logs (1), setup summary logs added (2), fold begin and complete logs added (3). Default: 3.
modelTypeSpecifies the model type to which cross validation is applied. Supported types include BNET, DECISIONTREE, FACTMAC, FOREST, GRADBOOST, NEURALNET, and SVM. Default: DECISIONTREE.
nSubsessionWorkersSpecifies the number of worker nodes for each subsession to use for parallel fold evaluation. Default: 0.
parallelFoldsWhen set to True, evaluates folds in parallel. Default: TRUE.
seedSpecifies the seed to use for fold sampling for cross validation. Default: 0.
targetEventSpecifies the name of the nominal target event to use for model assessment.
trainOptionsSpecifies a list of parameters for the model training action to use in the cross validation process. This is a required parameter.

Examples

FAQ

What is the purpose of the crossValidate action?
Which model types are supported by the crossValidate action?
What is the function of the 'kFolds' parameter?
How can I control the training process within cross-validation?
Is it possible to run the cross-validation folds in parallel?
How can I manage the randomness of fold sampling?
How can I adjust the amount of log information displayed?

Associated Scenarios

Use Case
Standard Churn Prediction with Gradient Boosting

A retail bank wants to estimate the generalization error of a Gradient Boosting model designed to predict customer churn. They need to ensure the model performs consistently acr...

Use Case
High-Volume Fraud Detection with Parallel Forest Training

A credit card processor needs to validate a Random Forest model for fraud detection on a large dataset. Due to the data volume and the complexity of the forest, they want to uti...

Use Case
Rare Disease Diagnosis with SVM and Specific Target Event

A medical research facility is testing a Support Vector Machine (SVM) classifier for a rare disease. They need to validate the model using a low number of folds due to small sam...