The boxPlot action calculates quantiles, high and low whiskers, and outliers for numeric variables. This action is essential for exploratory data analysis, allowing for a quick understanding of the distribution of data, its central tendency, variability, and the presence of outliers. It is widely used in statistics and data analysis to create box-and-whisker plots.
| Parameter | Description |
|---|---|
| attributes | Specifies temporary attributes, such as a format, to apply to input variables. |
| binNum | Specifies the number of bins to use for the analysis. The default is 64. |
| casOut | Specifies the output table. |
| freq | Specifies the frequency variable for the analysis. |
| groupByLimit | Specifies the maximum number of levels in a group-by set. When the server determines this number of levels, the server stops and does not return a result. |
| includeMissingGroup | When set to True, missing values are allowed as group-by keys. |
| inputs | Specifies the input variables to use in the analysis. |
| method | Specifies the algorithm for the percentile analysis. Supported algorithms include Iterative method and Exact method. |
| nOutBins | Specifies the number of bins to use for reporting outliers. If you specify a value for this parameter, then it implies the request to calculate outliers. |
| nOutLimit | Specifies the largest number of outliers to return. The actual outliers are returned rather than the binned values. Up to the specified number of outliers are returned on the high and low ends of the distribution. |
| outliers | When set to True, outliers are calculated. Check the binLo and binHi columns in the results. These values indicate whether the values displayed in the outlier columns are actual data values, or counts in bins. |
| partition | When set to True and the table is partitioned, the results are calculated for each partition efficiently. |
| partKey | When the table is partitioned and you specify the partition parameter, you can specify a partition key so that the results are computed for the single partition with the specified partition key. |
| pctlDef | Specifies one of five definitions for computing quantile statistics (percentiles) as described in the UNIVARIATE procedure documentation. The default value, 6, is to use an iterative process. |
| table | Specifies the input table for the analysis. |
| whiskerPercentile | Specifies that the percentile for the low and hi whiskers. For example, if you specify 10, then whiskers are set at the 10th and 90th percentiles. Observations that lay beyond the whiskers are outliers. |
This example creates the 'wines' table in your default caslib. This table contains information about different types of wines, including their alcohol content, sugar level, and pH. This dataset will be used in the following examples to demonstrate the boxPlot action.
| 1 | DATA casuser.wines; |
| 2 | SET sashelp.wines; |
| 3 | RUN; |
This example demonstrates a basic use of the boxPlot action. It calculates the box plot statistics (quartiles, median, whiskers) for the 'Alcohol' variable in the 'wines' table.
| 1 | PROC CAS; |
| 2 | percentile.boxPlot / |
| 3 | TABLE={name='wines'}, |
| 4 | inputs={{name='Alcohol'}}; |
| 5 | RUN; |
This example shows a more detailed use of the boxPlot action. It calculates box plot statistics for the 'Sugar' variable, grouped by 'Type' of wine. It also demonstrates how to detect and output outliers to a separate CAS table named 'wine_outliers'.
| 1 | PROC CAS; |
| 2 | percentile.boxPlot / |
| 3 | TABLE={name='wines', groupBy={'Type'}}, |
| 4 | inputs={{name='Sugar'}}, |
| 5 | outliers=true, |
| 6 | casOut={name='wine_outliers', replace=true}; |
| 7 | RUN; |
This example illustrates how to customize the whiskers of the box plot. Instead of the default interquartile range method, the whiskers are set to the 10th and 90th percentiles using the 'whiskerPercentile' parameter. This is applied to the 'pH' variable.
| 1 | PROC CAS; |
| 2 | percentile.boxPlot / |
| 3 | TABLE={name='wines'}, |
| 4 | inputs={{name='pH'}}, |
| 5 | whiskerPercentile=10; |
| 6 | RUN; |
A financial investment firm needs to assess and compare the risk profile of different market sectors. The goal is to analyze the daily return volatility of representative stocks...
A smart factory uses thousands of IoT sensors to monitor machine temperatures in real-time. To prevent overheating, the system needs to efficiently calculate the baseline temper...
A clinical research organization is validating patient data. The dataset contains missing values for treatment groups and some erroneous, extreme blood pressure readings. The go...