Published on :

Multiple Data Analysis and Statistics

This code is also available in: Deutsch Español Français
Awaiting validation
The script is structured into several independent sections. The first section creates a 'scoredata' dataset and derives a 'subsetscoredata' subset by filtering observations based on a condition on 'scorevalues'. The second section initializes a 'demography' dataset to then perform frequency analyses on the 'Gender' variable with `PROC FREQ` and descriptive statistics on 'Age', 'Weight', 'Height' with `PROC MEANS`. The main section creates a 'biology' dataset and applies a series of statistical analyses: calculating means for 'Age', 'Height', 'Weight', then descriptive statistics grouped by 'Sex', and subsequently by 'Year' and 'Sex'. A `PROC MEANS` is specifically used to save the calculated descriptive statistics (means, standard deviations, skewness, medians) into a new dataset named 'Stats_biology'. Finally, the script uses `PROC UNIVARIATE` for a distribution analysis of 'Height' and `PROC MEANS` with the `maxdec=2` option to format the output of statistics.
Data Analysis

Type : CREATION_INTERNE


All datasets ('scoredata', 'subsetscoredata', 'demography', 'biology', 'Stats_biology') are created directly within the script using `DATA STEP` blocks with embedded `datalines` or are derived from these internal datasets. No external data sources (files, databases) are referenced or required for script execution.

1 Code Block
DATA STEP Data
Explanation :
Creates a dataset named 'scoredata' with two variables, 'A' (character) and 'Scorevalues' (numeric), using data provided directly via `datalines`.
Copied!
1DATA scoredata;
2INPUT A $ Scorevalues;
3DATALINES;
4P 77 P 76 P 74 P 72 P 78
5D 80 D 84 D 88 D 87 D 90
6RUN;
2 Code Block
PROC PRINT
Explanation :
Displays the content of the 'scoredata' dataset in the standard SAS output.
Copied!
1PROC PRINT DATA=scoredata;
3 Code Block
DATA STEP Data
Explanation :
Creates a new dataset named 'subsetscoredata' from 'scoredata', including only observations where the value of 'Scorevalues' is strictly greater than 78.
Copied!
1DATA subsetscoredata;
2SET scoredata;
3IF scorevalues>78;
4RUN;
4 Code Block
PROC PRINT
Explanation :
Displays the content of the 'subsetscoredata' dataset in the standard SAS output.
Copied!
1PROC PRINT DATA=subsetscoredata;
5 Code Block
DATA STEP Data
Explanation :
Creates a dataset named 'demography' with 'Gender' (character), 'Age', 'Weight', and 'Height' (numeric) variables, using data provided via `datalines`. The `title Demography;` statement sets a title for subsequent procedure outputs.
Copied!
1*Q4;
2DATA demography;
3INPUT Gender $ Age Weight Height;
4DATALINES;
5M 50 68 155
6F 23 60 165
7M 65 72 180
8F 35 55 154
9M 15 35 158
10RUN;
11title Demography;
6 Code Block
PROC FREQ
Explanation :
Calculates and displays the frequency distribution for the 'Gender' variable of the 'demography' dataset, showing the count and percentage of occurrences for each gender category.
Copied!
1PROC FREQ DATA=demography;
2TABLE Gender;
7 Code Block
PROC MEANS
Explanation :
Calculates basic descriptive statistics (N, mean, standard deviation, minimum, maximum) for the 'Age', 'Weight', and 'Height' variables of the 'demography' dataset.
Copied!
1PROC MEANS DATA=demography;
2Var Age Weight height;
8 Code Block
DATA STEP Data
Explanation :
Creates a dataset named 'biology' with 'Id' (numeric), 'sex' (character), 'Age', 'Year', 'Height', and 'Weight' (numeric) variables, using data provided directly via `datalines`.
Copied!
1*------------------------------------;
2DATA biology;
3INPUT Id sex $ Age Year Height Weight;
4DATALINES;
57389 M 24 4 69.2 132.5
63945 F 19 2 58.5 112.8
74721 F 20 2 65.3 98.6
81835 F 24 4 62.8 102.5
99541 M 21 3 72.5 152.3
102957 M 22 3 67.3 145.8
112158 F 21 2 59.8 104.5
124296 F 25 3 62.5 132.5
134824 M 23 4 74.5 184.4
145736 M 22 3 69.1 149.5
158765 F 19 1 67.3 130.5
165734 F 18 1 64.3 110.2
17RUN;
9 Code Block
PROC PRINT
Explanation :
Displays the complete content of the 'biology' dataset in the standard SAS output.
Copied!
1PROC PRINT DATA=biology;
2RUN;
10 Code Block
PROC MEANS
Explanation :
Calculates basic descriptive statistics for the 'Age', 'Height', and 'Weight' variables of the 'biology' dataset.
Copied!
1*Q1) Obtain the means of Age,Height and Weight.;
2PROC MEANS DATA=biology;
3var Age Height Weight;
4RUN;
11 Code Block
PROC MEANS
Explanation :
Calculates descriptive statistics for the 'Age', 'Height', and 'Weight' variables of the 'biology' dataset, grouped by each category of the 'Sex' variable.
Copied!
1*Q2) Obtain the Discptive statistics of Age Height and Weight by Gender wise.;
2PROC MEANS DATA=biology;
3var Age Height Weight;
4class Sex;
5RUN;
12 Code Block
PROC MEANS
Explanation :
Calculates descriptive statistics for the 'Age', 'Height', and 'Weight' variables of the 'biology' dataset, grouped jointly by the 'Year' and 'Sex' variables.
Copied!
1*Q3) Obtain the Discptive statistics of Age Height and Weight by Gender and year wise.;
2PROC MEANS DATA=biology;
3var Age Height Weight;
4class year sex;
13 Code Block
PROC MEANS Data
Explanation :
Calculates descriptive statistics (means, standard deviations, skewness, and medians) for the 'Height' and 'Weight' variables of the 'biology' dataset, grouped by 'Year' and 'Sex'. The results are stored in a new dataset named 'Stats_biology'.
Copied!
1*Q4) store Descriptive statistics in a specific variable.;
2PROC MEANS DATA=biology;
3class year sex;
4OUTPUT out=Stats_biology mean=av_height av_weight std=sd_height sd_weight skewness=sk_height sk_weight median=md_height md_weight;
14 Code Block
PROC PRINT
Explanation :
Displays the content of the 'Stats_biology' dataset, which contains the descriptive statistics calculated and stored by the previous `PROC MEANS`.
Copied!
1PROC PRINT DATA=Stats_biology;
2RUN;
15 Code Block
PROC UNIVARIATE
Explanation :
Generates detailed univariate statistics, including moments, quantiles, normality tests, and graphs (if activated), for the 'Height' variable of the 'biology' dataset, to examine its distribution.
Copied!
1*Q5) Use univariate command to check the distribution of data.;
2PROC UNIVARIATE DATA=biology;
3var Height;
4RUN;
16 Code Block
PROC MEANS
Explanation :
Calculates basic descriptive statistics for all numeric variables of the 'biology' dataset, formatting the numeric outputs to display a maximum of two decimal places using the `maxdec=2` option.
Copied!
1*Q6) Use
2proc mean command and get the output upto two decimel;
3PROC MEANS
4DATA=biology maxdec=2;
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.