The exploreData action performs data exploration, automatic variable analysis, and grouping using comprehensive statistical profiling of variables. It calculates various statistics such as cardinality, entropy, kurtosis, missing values, and skewness to profile the data. This action is essential for understanding data structure and quality before proceeding with advanced modeling or feature engineering steps.
| Parameter | Description |
|---|---|
| casOut | Specifies the CAS table to store the analysis results. |
| distinctCountLimit | Specifies the distinct count limit. Default is 10000. If exceeded, the Misra-Gries algorithm may be used. |
| ecdfTolerance | Specifies the tolerance value for the empirical cumulative distribution function (ECDF). |
| event | Specifies the target variable level that you want to model for classification problems. |
| explorationPolicy | Specifies the automatic variable analysis and grouping (AVAPT) policy settings for cardinality, entropy, outliers, etc. |
| freq | Specifies the variable used for frequency counts. |
| inputs | Specifies the specific variables to use for the analysis. |
| misraGries | Specifies whether to use the Misra-Gries algorithm for frequency estimation if the distinct count limit is exceeded. Default is TRUE. |
| table | Specifies the input table name, caslib, and other data access options. |
| target | Specifies the target variable for the analysis. |
| weight | Specifies the variable to use for weighting observations. |
Load the SASHelp 'Cars' dataset into the active CAS session for analysis.
| 1 | PROC CAS; |
| 2 | SESSION casauto; |
| 3 | upload path="%sysfunc(pathname(cars, sashelp))" casout={name="cars", replace=true}; |
| 4 | RUN; |
Performs a basic exploration of the 'cars' table to profile all variables and stores the result in 'explore_results'.
| 1 | PROC CAS; |
| 2 | dataSciencePilot.exploreData / |
| 3 | TABLE={name="cars"} |
| 4 | casOut={name="explore_results", replace=true}; |
| 5 | RUN; |
Explores the 'cars' table with 'Origin' as the target variable. It customizes the exploration policy for skewness and missing values.
| 1 | PROC CAS; |
| 2 | dataSciencePilot.exploreData / |
| 3 | TABLE={name="cars"} |
| 4 | target="Origin" |
| 5 | explorationPolicy={ |
| 6 | skewness={momentMediumHighCutoff=5}, |
| 7 | missing={lowMediumCutoff=10} |
| 8 | } |
| 9 | casOut={name="detailed_explore", replace=true}; |
| 10 | RUN; |