The script begins by examining the structure and metadata of the `sashelp.cars` dataset using `PROC CONTENTS`. It then proceeds with summary statistics: first a global summary displayed by `PROC PRINT`, then specific `MSRP` averages grouped by `origin` and `make`. It also calculates the overall averages for `WheelBase` and `Weight`. A section of the script attempts to manually calculate the Pearson correlation coefficient between `WheelBase` and `Weight` by deriving deviations from the mean and their products. It is important to note that a syntax error in the `DATA STEP` where `xy_dev = x_dev = y_dev;` performs a logical comparison instead of multiplication, rendering the manual Pearson calculation incorrect. The script then validates the correlation through a direct calculation via `PROC CORR`. Finally, a simple linear regression analysis is performed with `PROC REG` to model the relationship between `weight` and `wheelbase`.
Data Analysis
Type : MIXED
The script uses the built-in `sashelp.cars` dataset as the primary source. Several intermediate datasets (`cars_summary`, `msrp`, `WW_means`, `cars`, `dev`) are created dynamically during execution to store procedure results and transformed data, which are then used in subsequent steps.
1 Code Block
PROC CONTENTS
Explanation : This procedure displays the data dictionary (metadata) for the `sashelp.cars` dataset. It provides information on variables, their types, formats, and lengths, which is essential for understanding data structure.
Copied!
proc contents data=sashelp.cars;
run;
1
PROC CONTENTSDATA=sashelp.cars;
2
RUN;
2 Code Block
PROC SUMMARY Data
Explanation : The first `PROC SUMMARY` calculates basic descriptive statistics for all numeric variables in the `sashelp.cars` dataset and stores these statistics in a new dataset named `cars_summary`. The subsequent `PROC PRINT` displays the contents of the `cars_summary` dataset.
Explanation : This `PROC SUMMARY` calculates the average retail price (`MSRP`) of cars. Statistics are grouped by `origin` and `make` (classification variables). The `NWAY` option ensures that the output only contains the most detailed combinations of the `CLASS` variables. The result, with the `average_msrp` variable, is stored in the `msrp` dataset.
Copied!
proc summary data = sashelp.cars nway;
class origin make;
var msrp;
output out = msrp mean(msrp) = average_msrp;
run;
1
PROC SUMMARYDATA = sashelp.cars nway;
2
class origin make;
3
var msrp;
4
OUTPUT out = msrp mean(msrp) = average_msrp;
5
RUN;
4 Code Block
PROC SUMMARY Data
Explanation : This `PROC SUMMARY` calculates the means of the `wheelbase` and `weight` variables from the `sashelp.cars` dataset. The calculated means are stored in the `WW_means` dataset under the names `mean_wheelbase` and `mean_weight`.
Copied!
proc summary data = sashelp.cars;
var wheelbase weight;
output out = WW_means mean(WheelBase Weight) = mean_wheelbase mean_weight;
run;
1
PROC SUMMARYDATA = sashelp.cars;
2
var wheelbase weight;
3
OUTPUT out = WW_means mean(WheelBase Weight) = mean_wheelbase mean_weight;
4
RUN;
5 Code Block
DATA STEP Data
Explanation : This `DATA` step creates a new `cars` dataset. It merges the means (`mean_wheelbase`, `mean_weight`) from the `WW_means` dataset with `sashelp.cars` using a 'persistent set' technique where the means are read only once for the first record (`_n_ eq 1`). It then calculates deviations of `wheelbase` (`x_dev`) and `weight` (`y_dev`) from their means. The line `xy_dev = x_dev = y_dev;` performs a logical comparison, assigning 1 to `xy_dev` if `x_dev` equals `y_dev`, and 0 otherwise. For a Pearson correlation calculation, this line should be `xy_dev = x_dev * y_dev;`.
Copied!
data cars;
set sashelp.cars;
if (_n_ eq 1) then set ww_means;
x_dev = wheelbase - mean_wheelbase;
y_dev = weight - mean_weight;
xy_dev = x_dev = y_dev;
output;
run;
1
DATA cars;
2
SET sashelp.cars;
3
IF (_n_ eq 1) THENSET ww_means;
4
x_dev = wheelbase - mean_wheelbase;
5
y_dev = weight - mean_weight;
6
xy_dev = x_dev = y_dev;
7
OUTPUT;
8
RUN;
6 Code Block
PROC SUMMARY Data
Explanation : This `PROC SUMMARY` takes the `cars` dataset as input. It calculates the uncorrected sum of squares (`USS`) for `x_dev` and `y_dev` (stored in `x_ss` and `y_ss`), and the sum (`SUM`) of `xy_dev` (stored in `xy_ss`). These statistics are intermediate for the manual calculation of the Pearson correlation coefficient.
Copied!
proc summary data = cars;
var x_dev y_dev xy_dev;
output out = dev uss(x_dev y_dev) = x_ss y_ss sum(xy_dev) = xy_ss;
run;
1
PROC SUMMARYDATA = cars;
2
var x_dev y_dev xy_dev;
3
OUTPUT out = dev uss(x_dev y_dev) = x_ss y_ss sum(xy_dev) = xy_ss;
4
RUN;
7 Code Block
DATA STEP Data
Explanation : This `DATA` step takes the `dev` dataset and adds a new `PearsonCorrelation` variable to it. It uses the standard formula to calculate the Pearson correlation coefficient from the sums of squares (`x_ss`, `y_ss`) and the sum of products of deviations (`xy_ss`). However, due to the error in the `xy_dev` calculation in a previous `DATA` step, the `PearsonCorrelation` calculated here will not represent the true correlation.
Copied!
data dev;
set dev;
PearsonCorrelation = xy_ss/(sqrt(x_ss) *sqrt(y_ss));
run;
Explanation : This `PROC CORR` directly calculates the Pearson correlation coefficient between the `WheelBase` and `Weight` variables from the `sashelp.cars` dataset. This is the standard and recommended method for obtaining correlations.
Copied!
proc corr data = sashelp.cars;
var WheelBase Weight;
run;
1
2
PROC CORR
3
DATA = sashelp.cars;
4
var WheelBase Weight;
5
RUN;
6
9 Code Block
PROC REG
Explanation : This `PROC REG` performs a linear regression analysis. It models the `weight` (dependent) variable as a function of the `wheelbase` (independent) variable using the `sashelp.cars` dataset, providing statistics on model fit, regression coefficients, and ANOVA.
Copied!
proc reg data=sashelp.cars;
model weight=wheelbase;
run;
1
2
PROC REG
3
DATA=sashelp.cars;
4
model weight=wheelbase;
5
RUN;
6
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Related Documentation
Aucune documentation spécifique pour cette catégorie.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.