The catTrans action groups and encodes categorical variables using various unsupervised and supervised techniques. It is useful for feature engineering, reducing cardinality, and preparing data for modeling by transforming categorical variables into more meaningful representations like Weight of ...
The exploreData action performs data exploration, automatic variable analysis, and grouping using comprehensive statistical profiling of variables. It calculates various statistics such as cardinality, entropy, kurtosis, missing values, and skewness to profile the data. This action is essential f...
Performs randomized cardinality estimation.
An e-commerce platform is analyzing web server logs to understand traffic sources. The 'Referrer_URL' variable has extremely high cardinality (thousands of unique referring sites). To make this variable usable in a clustering model, the Data Science team wants to keep only the top 5% most frequen...
When set to TRUE within the 'cntl' parameter list, it instructs the FedSQL query planner to perform cardinality estimations of the input data.
The explorationPolicy parameter specifies the automatic variable analysis and grouping (AVAPT) policy, including sub-policies for cardinality, coefficient of variation (cv), entropy, index of qualitative variation (iqv), kurtosis, missing values, nominal variables, outliers, and skewness.
The explorationPolicy parameter specifies the policy for automatic variable analysis and grouping (AVAPT). It contains sub-parameters to configure analysis based on cardinality, coefficient of variation (cv), entropy, index of qualitative variation (iqv), kurtosis, missing values, nominal variabl...
The transformationPolicy parameter defines the scope of feature transformations and generations the machine will perform. It allows you to enable or disable specific transformation types such as those for cardinality reduction, entropy, interactions, kurtosis, missing value treatment, outlier tre...
by default, a greedy search or exhaustive search is used to determine the best split for each variable of each tree node. When set to False, a fast and efficient algorithm that is based on clustering is applied. Setting this parameter to False is recommended for variables with high cardinality. D...
specifies the method for finding a split on a nominal input. Alias: nomSearch handling: CLASSIC | ENHANCED maxCategories: specifies the maximum number of levels for a splitting rule to include. Aliases: maxCats, maxLevels, maxValues, cluster, minCardCluster Default: 128 Minimum value: 0 shrinkag...