Non-parametric analysis of candidate incomes

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
The script begins by deleting any existing version of the 'repincome' table in the WORK library. It then defines a FILENAME to point to an external CSV file containing income data. The PROC IMPORT procedure is used to load this CSV file into a temporary SAS© dataset named WORK.repincome. After import, PROC CONTENTS displays the dataset's metadata. Two graphs are generated with PROC SGPLOT: a box plot of incomes by candidate and a scatter plot of incomes against candidates. Next, the script creates three filtered datasets (TrumpCarson, TrumpCruz, CruzCarson) by excluding a specific candidate each time. For each pair of candidates, a non-parametric Wilcoxon test (PROC NPAR1WAY) is performed to compare income medians, with an alpha significance level of 0.05. Test titles are customized for each comparison.
Data Analysis

Type : EXTERNE


Data originates from an external CSV file ('/home/gsturrock0/STAT1/RepIncome..csv') imported via PROC IMPORT. The script does not contain internal data (datalines/cards) nor does it use SASHELP data.

1 Code Block
PROC IMPORT Data
Explanation :
This block initializes the environment and imports data. The %web_drop_table macro deletes an existing SAS table. FILENAME associates a logical name (REFFILE) with the external CSV file path. PROC IMPORT reads this CSV file and converts it into a SAS dataset named WORK.repincome, using the first row as variable names (GETNAMES=YES). PROC CONTENTS displays the imported dataset's metadata. Finally, %web_open_table is used to view the table in SAS Studio.
Copied!
1%web_drop_table(WORK.repincome);
2 
3FILENAME REFFILE '/home/myFolder/STAT1/RepIncome..csv';
4 
5PROC IMPORT DATAFILE=REFFILE
6 DBMS=CSV
7 OUT=WORK.repincome;
8 GETNAMES=YES;
9RUN;
10 
11PROC CONTENTS DATA=WORK.repincome; RUN;
12 
13%web_open_table(WORK.repincome);
2 Code Block
PROC SGPLOT
Explanation :
This block is dedicated to exploratory data visualization. The two PROC SGPLOT calls create graphs: the first generates a box plot (vbox) of incomes ('income') for each candidate ('category=candidate'), and the second produces a scatter plot (scatter) of incomes ('y=income') as a function of the candidate ('x=candidate'). These graphs help understand the distribution and relationship between incomes and candidates.
Copied!
1PROC SGPLOT DATA=work.repincome;
2vbox income / category=candidate;
3RUN;
4 
5PROC SGPLOT DATA=work.repincome;
6scatter y=income x=candidate;
7RUN;
3 Code Block
DATA STEP / PROC NPAR1WAY Data
Explanation :
This block focuses on the comparison between 'Trump' and 'Carson'. A DATA step creates a new dataset 'TrumpCarson' by filtering 'work.repincome' to include only observations where the candidate's 'code' is not '3' (thus excluding the third candidate). Then, PROC NPAR1WAY performs a non-parametric Wilcoxon test on the 'income' variable, using 'candidate' as the classification variable, with a significance level (alpha) of 0.05. The exact Wilcoxon test is requested, with Hodges-Lehmann (HL) difference estimation. A specific title 'Trump Carson' is added to the output report.
Copied!
1*Trump Carson comparison;
2DATA TrumpCarson; SET work.repincome;
3 IF code NE 3;
4RUN;
5 
6PROC NPAR1WAY DATA=TrumpCarson wilcoxon alpha=.05;
7var income;
8class candidate;
9exact wilcoxon HL;
10title 'Trump Carson';
11RUN;
12title;
4 Code Block
DATA STEP / PROC NPAR1WAY Data
Explanation :
Similar to the previous block, this segment prepares and analyzes data for the 'Trump' versus 'Cruz' comparison. A DATA step filters 'work.repincome' to create 'TrumpCruz', excluding the candidate with 'code' '2'. PROC NPAR1WAY is then called to perform a Wilcoxon test on 'income' by 'candidate', with the same analysis parameters (alpha=0.05, exact Wilcoxon with HL). The report is titled 'Trump Cruz'.
Copied!
1*Trump Cruz Comparison;
2DATA TrumpCruz; SET work.repincome;
3 IF code NE 2;
4RUN;
5 
6PROC NPAR1WAY DATA=TrumpCruz wilcoxon alpha=.05;
7var income;
8class candidate;
9exact wilcoxon HL;
10title 'Trump Cruz';
11RUN;
12title;
5 Code Block
DATA STEP / PROC NPAR1WAY Data
Explanation :
This final block performs the comparison between 'Cruz' and 'Carson'. A 'CruzCarson' dataset is created by filtering 'work.repincome' to exclude the candidate with 'code' '1'. Then, PROC NPAR1WAY is used for a Wilcoxon test on the incomes ('income') of the remaining candidates ('candidate'), applying the same specifications as the previous analyses (alpha=0.05, exact Wilcoxon with HL). The title 'Cruz Carson' is assigned to the analysis result.
Copied!
1*Cruz Carson comparison;
2DATA CruzCarson; SET work.repincome;
3 IF code NE 1;
4RUN;
5 
6PROC NPAR1WAY DATA=CruzCarson wilcoxon alpha=.05;
7var income;
8class candidate;
9exact wilcoxon HL;
10title 'Cruz Carson';
11RUN;
12title;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.