Published on :
Statistical INTERNAL_CREATION

Analysis of Eye and Hair Color Frequency by Region

This code is also available in: Deutsch Español Français
Awaiting validation
The script initializes a dataset named 'Color' using a DATA step and data directly provided via 'datalines'. The variables 'Region', 'Eyes' (eye color, character), 'Hair' (hair color, character) and 'Count' (numeric) are defined. Descriptive labels are assigned to the 'Eyes', 'Hair' and 'Region' variables for better interpretation of the outputs. Subsequently, three distinct PROC FREQ blocks are executed on the 'Color' dataset. The first two blocks focus on analyzing the frequency of the 'Region' variable, applying binomial tests with specific confidence interval methods (Agresti-Coull, Wilson, exact) for the first ('level=1') and second ('level=2') level of the 'Region' variable, with an alpha threshold of 0.1. The 'Count' variable is used as a weight for these analyses. A common title is also defined for the outputs of these procedures. The third PROC FREQ block performs a standard binomial frequency analysis on the 'Region' variable without advanced specifications. The overall objective of the script is to examine the distribution and proportions of eye and hair color categories based on the geographical region.
Data Analysis

Type : INTERNAL_CREATION


The 'Color' dataset is created and populated directly within the script via a DATA step and the DATALINES statement. All data required for the analysis is provided internally.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block creates the 'Color' dataset by reading raw data provided in DATALINES. It defines four variables: 'Region' (numeric), 'Eyes' (character string), 'Hair' (character string) and 'Count' (numeric). Descriptive labels are assigned to the 'Eyes', 'Hair' and 'Region' variables to improve the readability of output reports.
Copied!
1DATA Color;
2 INPUT Region Eyes $ Hair $ Count;
3 label Eyes ='Eye Color'
4 Hair ='Hair Color'
5 Region='Geographic Region';
6 DATALINES;
71 blue fair 23 1 blue red 7 1 blue medium 24
81 blue dark 11 1 green fair 19 1 green red 7
91 green medium 18 1 green dark 14 1 brown fair 34
101 brown red 5 1 brown medium 41 1 brown dark 40
111 brown black 3 0 blue fair 46 0 blue red 21
120 blue medium 44 0 blue dark 40 0 blue black 6
130 green fair 50 0 green red 31 0 green medium 37
140 green dark 23 0 brown fair 56 0 brown red 42
150 brown medium 53 0 brown dark 54 0 brown black 13
16;
17RUN;
2 Code Block
PROC FREQ
Explanation :
This block executes PROC FREQ on the 'Color' dataset. It generates frequency tables for the 'Region' variable. The `binomial(ac wilson exact level=1) alpha=.1` option requests the calculation of binomial confidence intervals (Agresti-Coull, Wilson, exact) for the first level of 'Region', with a significance level of 0.1. The 'Count' variable is used as the observation weighting variable. A title is also specified for the output.
Copied!
1PROC FREQ DATA=Color order=freq;
2 tables region / binomial(ac wilson exact level=1) alpha=.1 ;
3 exact binomial;
4 weight Count;
5 title 'Hair and Eye Color of European Children';
6RUN;
3 Code Block
PROC FREQ
Explanation :
Similar to the previous block, this PROC FREQ also analyzes the 'Region' variable of the 'Color' dataset. The main difference is the `level=2` option in `binomial(ac wilson exact level=2)`, which indicates that binomial confidence interval calculations are performed for the second level of the 'Region' variable, still with an alpha of 0.1 and 'Count' as weight. A title is also assigned.
Copied!
1PROC FREQ DATA=Color order=freq;
2 tables region / binomial(ac wilson exact level=2) alpha=.1 ;
3 exact binomial;
4 weight Count;
5 title 'Hair and Eye Color of European Children';
6RUN;
4 Code Block
PROC FREQ
Explanation :
This block executes a PROC FREQ on the 'Color' dataset for the 'Region' variable. The `binomial` option alone requests standard binomial statistics for each level of 'Region', including proportions, frequencies, and default confidence intervals, without advanced specifications of calculation methods or level. The order of frequencies is maintained.
Copied!
1 
2PROC FREQ
3DATA=Color order=freq;
4tables region / binomial;
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.