Search - WeAreCAS

Action CAS

Voir

catTrans

The catTrans action groups and encodes categorical variables using various unsupervised and supervised techniques. It is useful for feature engineering, reducing cardinality, and preparing data for modeling by transforming categorical variables into more meaningful representations like Weight of ...

Action CAS

Voir

exploreData

The exploreData action performs data exploration, automatic variable analysis, and grouping using comprehensive statistical profiling of variables. It calculates various statistics such as cardinality, entropy, kurtosis, missing values, and skewness to profile the data. This action is essential f...

Action CAS

Voir

highCardinality

Performs randomized cardinality estimation.

Scénario

Voir

Web Analytics: Reducing High Cardinality with GroupRare

An e-commerce platform is analyzing web server logs to understand traffic sources. The 'Referrer_URL' variable has extremely high cardinality (thousands of unique referring sites). To make this variable usable in a clustering model, the Data Science team wants to keep only the top 5% most frequen...

FAQ

Voir

What does the 'dynamicCardinality' control parameter do?

When set to TRUE within the 'cntl' parameter list, it instructs the FedSQL query planner to perform cardinality estimations of the input data.

FAQ

Voir

What is defined by the explorationPolicy parameter?

The explorationPolicy parameter specifies the automatic variable analysis and grouping (AVAPT) policy, including sub-policies for cardinality, coefficient of variation (cv), entropy, index of qualitative variation (iqv), kurtosis, missing values, nominal variables, outliers, and skewness.

FAQ

Voir

What does the explorationPolicy parameter control?

The explorationPolicy parameter specifies the policy for automatic variable analysis and grouping (AVAPT). It contains sub-parameters to configure analysis based on cardinality, coefficient of variation (cv), entropy, index of qualitative variation (iqv), kurtosis, missing values, nominal variabl...

FAQ

Voir

What is the purpose of the transformationPolicy parameter?

The transformationPolicy parameter defines the scope of feature transformations and generations the machine will perform. It allows you to enable or disable specific transformation types such as those for cardinality reduction, entropy, interactions, kurtosis, missing value treatment, outlier tre...

FAQ

Voir

greedy

by default, a greedy search or exhaustive search is used to determine the best split for each variable of each tree node. When set to False, a fast and efficient algorithm that is based on clustering is applied. Setting this parameter to False is recommended for variables with high cardinality. D...

FAQ

Voir

nominalSearch

specifies the method for finding a split on a nominal input. Alias: nomSearch handling: CLASSIC | ENHANCED maxCategories: specifies the maximum number of levels for a splitting rule to include. Aliases: maxCats, maxLevels, maxValues, cluster, minCardCluster Default: 128 Minimum value: 0 shrinkag...