Published on :

Data Exploration with Base SAS Procedures

This code is also available in: Deutsch Español Français
This script is designed to help understand and validate data by analyzing column attributes and their values. It uses several SAS© procedures for quick and easy data exploration: PROC PRINT to visualize observations, PROC MEANS for summary descriptive statistics, PROC UNIVARIATE for more in-depth statistics, and PROC FREQ to generate frequency tables, which is ideal for identifying unexpected or inconsistent values and validating data quality.
Data Analysis

Type : SASHELP


The data used comes from the internal SASHELP library, specifically the `sashelp.class` and `sashelp.cars` datasets.

1 Code Block
PROC PRINT
Explanation :
Displays the entire `sashelp.class` dataset, listing all columns and all observations by default. Useful for an initial overview of the data.
Copied!
1PROC PRINT DATA=sashelp.class;
2RUN;
2 Code Block
PROC PRINT
Explanation :
Displays the first 10 observations (rows) of the `sashelp.class` dataset, using the `obs=10` option to limit the output. This provides a quick overview without displaying all data.
Copied!
1PROC PRINT DATA=sashelp.class (obs=10);
2RUN;
3 Code Block
PROC PRINT
Explanation :
Displays the first 10 observations of the `sashelp.cars` dataset, but limits the displayed columns to the 'Make', 'Model', 'Type', and 'MSRP' variables specified in the `VAR` statement.
Copied!
1 
2PROC PRINT
3DATA=sashelp.cars (obs=10);
4var make model type msrp;
5RUN;
6 
4 Code Block
PROC MEANS
Explanation :
Calculates summary descriptive statistics (by default: N, mean, standard deviation, min, max) for the numeric variables 'enginesize', 'horsepower', 'mpg_city', and 'mpg_highway' from the `sashelp.cars` dataset. Allows for quick identification of central tendencies and value ranges.
Copied!
1 
2PROC MEANS
3DATA=sashelp.cars;
4var enginesize horsepower mpg_city mpg_highway;
5RUN;
6 
5 Code Block
PROC UNIVARIATE
Explanation :
Generates more detailed descriptive statistics for the numeric variable 'mpg_highway' from the `sashelp.cars` dataset. This includes quantiles, normality tests, moments, information on extreme values, and graphs (if outputted).
Copied!
1 
2PROC UNIVARIATE
3DATA=sashelp.cars;
4var mpg_highway;
5RUN;
6 
6 Code Block
PROC FREQ
Explanation :
Creates frequency tables for the categorical variables 'origin', 'type', and 'drivetrain' from the `sashelp.cars` dataset. Each table presents distinct values, their frequency, their percentage, and their cumulative frequencies/percentages. Very useful for data validation and anomaly detection.
Copied!
1 
2PROC FREQ
3DATA=sashelp.cars;
4tables origin type drivetrain;
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.