Cancer Data Analysis and Sorting

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
The script begins with a DATA STEP that reads cancer information (cause, year, number of male and female cases, number of male and female deaths) from the DATALINES section. It then calculates the 'deaths' variable (total deaths) and converts 'mcases' and 'mdeaths' to negative values for a potential specific display or analysis reason. Next, a PROC SORT is used to sort the 'work.cancer' dataset and create 'work.cancer_sorted', ordering records by 'Ano' (year) and 'deaths' (deaths) in descending order. Finally, a PROC FORMAT is defined to create a custom picture format called 'positive' that formats numbers with thousands separators.
Data Analysis

Type : CREATION_INTERNE


Raw data is integrated directly into the SAS script via the DATALINES section of the DATA STEP, meaning it is internally created and does not depend on external files or pre-existing SAS libraries (with the exception of standard work libraries like WORK).

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP creates the 'work.cancer' dataset. It reads the variables 'cause', 'Ano', 'mcases', 'fcases', 'mdeaths', 'fdeaths' from the datalines. 'cause' is a 20-character string, 'Ano' is also read as a string (although it contains numbers), and the others are numeric. It calculates a new variable 'deaths' by adding 'mdeaths' and 'fdeaths'. The 'mcases' and 'mdeaths' variables are multiplied by -1, making them negative. This could be for graphical representation or a specific calculation where these values are treated as deductions.
Copied!
1DATA work.cancer;
2 INFILE DATALINES;
3 INPUT cause $ 1-20 Ano $ mcases fcases mdeaths fdeaths;
4 deaths=mdeaths + fdeaths;
5 mcases= -1 * mcases;
6 mdeaths= -1 * mdeaths;
7 DATALINES;
8Câncer de Pulmão 2007 114760 98620 89510 70880
9Câncer Colorretal 2007 55290 57050 26000 26180
10Câncer de Mama 2007 2030 178480 450 40460
11Câncer de Pâncreas 2007 18830 18340 16840 16530
12Câncer de Próstata 2007 218890 0 27050 0
13Leucemia 2007 24800 19440 12320 9470
14Linfoma 2007 38670 32710 10370 9360
15Câncer de Fígado 2007 13650 5510 11280 5500
16Câncer de Ovário 2007 0 22430 0 15280
17Câncer de Esôfago 2007 12130 3430 10900 3040
18Câncer de Bexiga 2007 50040 17120 9630 4120
19Câncer de Rim 2007 31590 19600 8080 4810
20Câncer de Pulmão 1997 98300 79800 94400 66000
21Câncer Colorretal 1997 45500 48600 22600 24000
22Câncer de Mama 1997 1400 180200 290 43900
23Câncer de Pâncreas 1997 13400 14200 13500 14600
24Câncer de Próstata 1997 334500 0 41800 0
25Leucemia 1997 15900 12400 11770 9540
26Linfoma 1997 34200 26900 13220 12060
27Câncer de Fígado 1997 9100 4500 7500 4900
28Câncer de Ovário 1997 0 26800 0 14200
29Câncer de Esôfago 1997 9400 3100 8700 2800
30Câncer de Bexiga 1997 39500 15000 7800 3900
31Câncer de Rim 1997 17100 11700 7000 4300
32;
33RUN;
2 Code Block
PROC SORT
Explanation :
This PROC SORT takes the 'work.cancer' dataset as input and creates a new dataset named 'work.cancer_sorted'. Sorting is done in two steps: first by 'Ano' (year) in descending order, then by 'deaths' (total number of deaths) also in descending order. This allows the data to be ranked by the most recent years and, for each year, by the causes of cancer with the highest number of deaths.
Copied!
1 
2PROC SORT
3DATA=cancer out=cancer_sorted;
4BY descending Ano descending deaths;
5RUN;
6 
3 Code Block
PROC FORMAT
Explanation :
This PROC FORMAT defines a custom picture format called 'positive'. This format is designed to display positive numbers (from 0 to the 'high' maximum value) with thousands separators, for example, '123,456'. Negative numbers (from 'low' to less than 0) are also formatted in the same way, which seems to indicate that the format is intended to be applied to absolute numbers or displays that do not distinguish the sign, but rather the magnitude with specific formatting.
Copied!
1PROC FORMAT;
2 picture positive low-<0='000,000'
3 0<-high='000,000';
4RUN;
5 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.