ROC (Receiver Operating Characteristic) Information: Analysis of the model's ability to separate events from non-events at different probability thresholds.
Lift Information: Measures the model's effectiveness by comparing the proportion of events captured by the model to a random selection.
Fit Statistics: Various metrics quantifying the overall performance of the model, such as quadratic error or logistic loss.
The provided examples illustrate how to create data, use common options (such as the number of cutoffs and bins), advanced scenarios with custom formats, and integration with the distributed CAS environment for processing large volumes of data.
Data Analysis
Type : INTERNAL_CREATION
The examples use synthetic data generated by a Data Step to create prediction variables (p_good, p_bad) and a binary target variable (good_bad). A large dataset is also generated directly in CAS for the advanced example to demonstrate the processing capability for large volumes of data.
1 Code Block
PROC ASSESS Data
Explanation : This example illustrates the simplest use of the ASSESS procedure for model evaluation. After establishing a CAS connection and creating synthetic score data (with `p_good` as the probability of a positive event and `good_bad` as the target), the data is loaded into a CAS table named `score_data`. The ASSESS procedure is then called by specifying the prediction variable (`p_good`) and the binary target variable (`good_bad`). By default, the procedure calculates basic ROC and lift metrics.
Copied!
/* Configuration CAS */
cas;
caslib _all_ assign;
/* Préparation des données: Création d'un jeu de données de score synthétiques */
data work.score_data;
length good_bad $4;
input _PartInd_ good_bad $ p_good p_bad;
datalines;
0 good 0.6675 0.3325
0 good 0.5189 0.4811
0 good 0.6852 0.3148
0 bad 0.0615 0.9385
0 bad 0.3053 0.6947
0 bad 0.6684 0.3316
0 good 0.6422 0.3578
0 good 0.6752 0.3248
0 good 0.5396 0.4604
0 good 0.4983 0.5017
0 bad 0.1916 0.8084
0 good 0.5722 0.4278
0 good 0.7099 0.2901
0 good 0.4642 0.5358
0 good 0.4863 0.5137
1 bad 0.4942 0.5058
1 bad 0.4863 0.5137
1 bad 0.4942 0.5058
1 good 0.6118 0.3882
1 good 0.5375 0.4625
1 good 0.8132 0.1868
1 good 0.6914 0.3086
1 good 0.5700 0.4300
1 good 0.8189 0.1811
1 good 0.2614 0.7386
1 good 0.1910 0.8090
1 good 0.5129 0.4871
1 good 0.8417 0.1583
1 good 0.5500 0.4500
;
run;
/* Charger les données dans la session CAS */
proc casutil incaslib="WORK" outcaslib="CASUSER" outkeep=(_ALL_) replace;
load data=score_data outcasfmt;
run;
/* Exemple 1 : Utilisation Basique de PROC ASSESS */
proc assess data=casuser.score_data;
var p_good;
target good_bad;
run;
1
/* Configuration CAS */
2
cas;
3
caslib _all_ assign;
4
5
/* Préparation des données: Création d'un jeu de données de score synthétiques */
/* Exemple 1 : Utilisation Basique de PROC ASSESS */
48
PROC ASSESSDATA=casuser.score_data;
49
var p_good;
50
target good_bad;
51
RUN;
2 Code Block
PROC ASSESS Data
Explanation : This example extends basic usage by including common options for more detailed analysis. `NCUTS=5` defines 5 cutoff thresholds for ROC analysis, and `NBINS=5` specifies 5 bins for lift analysis. `EVENT="good" LEVEL=NOMINAL` indicates that 'good' is the event class of interest for the nominal target variable. The `FITSTAT` statement is added to calculate fit statistics using `p_bad` as the probability of the reference event ('bad').
Copied!
/* Configuration CAS (si non déjà configurée) */
cas;
caslib _all_ assign;
/* Préparation des données: Création d'un jeu de données de score synthétiques */
data work.score_data;
length good_bad $4;
input _PartInd_ good_bad $ p_good p_bad;
datalines;
0 good 0.6675 0.3325
0 good 0.5189 0.4811
0 good 0.6852 0.3148
0 bad 0.0615 0.9385
0 bad 0.3053 0.6947
0 bad 0.6684 0.3316
0 good 0.6422 0.3578
0 good 0.6752 0.3248
0 good 0.5396 0.4604
0 good 0.4983 0.5017
0 bad 0.1916 0.8084
0 good 0.5722 0.4278
0 good 0.7099 0.2901
0 good 0.4642 0.5358
0 good 0.4863 0.5137
1 bad 0.4942 0.5058
1 bad 0.4863 0.5137
1 bad 0.4942 0.5058
1 good 0.6118 0.3882
1 good 0.5375 0.4625
1 good 0.8132 0.1868
1 good 0.6914 0.3086
1 good 0.5700 0.4300
1 good 0.8189 0.1811
1 good 0.2614 0.7386
1 good 0.1910 0.8090
1 good 0.5129 0.4871
1 good 0.8417 0.1583
1 good 0.5500 0.4500
;
run;
/* Charger les données dans la session CAS */
proc casutil incaslib="WORK" outcaslib="CASUSER" outkeep=(_ALL_) replace;
load data=score_data outcasfmt;
run;
/* Exemple 2 : Utilisation de PROC ASSESS avec options courantes */
proc assess data=casuser.score_data ncuts=5 nbins=5;
var p_good;
target good_bad / event="good" level=nominal;
fitstat pvar=p_bad / pevent="bad";
run;
1
/* Configuration CAS (si non déjà configurée) */
2
cas;
3
caslib _all_ assign;
4
5
/* Préparation des données: Création d'un jeu de données de score synthétiques */
Explanation : This advanced example shows how to customize and deepen the analysis. A `PROC FORMAT` is used to create a custom format for the `_PartInd_` variable, which makes outputs more readable when analyzing by groups. The formatted data is then loaded into a new CAS table. The `NBINS=10` option increases the granularity of the lift analysis. The `ROC` statement uses the `CUTOFF` option to specify custom cutoff thresholds (from 0.1 to 0.9 with a step of 0.1) and the `PLOTS` option to generate graphical plots (like the ROC curve). The `BY _PartInd_` statement executes separate analyses for each data partition.
Copied!
/* Configuration CAS (si non déjà configurée) */
cas;
caslib _all_ assign;
/* Préparation des données: Création d'un jeu de données de score synthétiques */
data work.score_data;
length good_bad $4;
input _PartInd_ good_bad $ p_good p_bad;
datalines;
0 good 0.6675 0.3325
0 good 0.5189 0.4811
0 good 0.6852 0.3148
0 bad 0.0615 0.9385
0 bad 0.3053 0.6947
0 bad 0.6684 0.3316
0 good 0.6422 0.3578
0 good 0.6752 0.3248
0 good 0.5396 0.4604
0 good 0.4983 0.5017
0 bad 0.1916 0.8084
0 good 0.5722 0.4278
0 good 0.7099 0.2901
0 good 0.4642 0.5358
0 good 0.4863 0.5137
1 bad 0.4942 0.5058
1 bad 0.4863 0.5137
1 bad 0.4942 0.5058
1 good 0.6118 0.3882
1 good 0.5375 0.4625
1 good 0.8132 0.1868
1 good 0.6914 0.3086
1 good 0.5700 0.4300
1 good 0.8189 0.1811
1 good 0.2614 0.7386
1 good 0.1910 0.8090
1 good 0.5129 0.4871
1 good 0.8417 0.1583
1 good 0.5500 0.4500
;
run;
/* Charger les données dans la session CAS */
proc casutil incaslib="WORK" outcaslib="CASUSER" outkeep=(_ALL_) replace;
load data=score_data outcasfmt;
run;
/* Création d'un format personnalisé pour la variable _PartInd_ */
proc format;
value $partfmt '0' = 'Partition A'
'1' = 'Partition B';
run;
data casuser.score_data_fmt;
set casuser.score_data;
format _PartInd_ $partfmt.;
run;
/* Exemple 3 : Cas Avancé de PROC ASSESS */
proc assess data=casuser.score_data_fmt nbins=10;
var p_good;
target good_bad / event="good" level=nominal;
fitstat pvar=p_bad / pevent="bad";
roc / cutoff=0.1 to 0.9 by 0.1 plots; /* Spécifie des seuils de coupure personnalisés et demande les tracés ROC */
by _PartInd_;
run;
1
/* Configuration CAS (si non déjà configurée) */
2
cas;
3
caslib _all_ assign;
4
5
/* Préparation des données: Création d'un jeu de données de score synthétiques */
/* Création d'un format personnalisé pour la variable _PartInd_ */
48
PROC FORMAT;
49
value $partfmt '0' = 'Partition A'
50
'1' = 'Partition B';
51
RUN;
52
53
DATA casuser.score_data_fmt;
54
SET casuser.score_data;
55
FORMAT _PartInd_ $partfmt.;
56
RUN;
57
58
/* Exemple 3 : Cas Avancé de PROC ASSESS */
59
PROC ASSESSDATA=casuser.score_data_fmt nbins=10;
60
var p_good;
61
target good_bad / event="good" level=nominal;
62
fitstat pvar=p_bad / pevent="bad";
63
roc / cutoff=0.1 to 0.9BY0.1 plots; /* Spécifie des seuils de coupure personnalisés et demande les tracés ROC */
64
BY _PartInd_;
65
RUN;
4 Code Block
PROC ASSESS Data
Explanation : This example emphasizes integration with SAS Viya for processing large volumes of data. A dataset of 20,000 observations is generated directly in CAS, highlighting the platform's ability to handle massive in-memory data. The `NBINS=20` option is used for a more detailed lift analysis. The `ROC` statement includes `ADJUSTFOR=good_bad(event="good")` to adjust ROC metrics based on the actual distribution of the target variable, which is crucial for imbalanced datasets. The temporary CAS table is then dropped to clean up the environment.
Copied!
/* Configuration CAS (si non déjà configurée) */
cas;
caslib _all_ assign;
/* Exemple 4 : Intégration Viya / Grand volume de données */
/* Création d'un grand jeu de données synthétique directement dans CAS */
data casuser.large_score_data;
do _PartInd_ = 0 to 1;
do i = 1 to 10000; /* Créer 20,000 observations */
good_bad = ifc(ranuni(0) > 0.7, 'bad', 'good');
p_good = ranuni(0); /* Probabilité de 'good' */
p_bad = 1 - p_good; /* Probabilité de 'bad' */
output;
end;
end;
drop i;
run;
proc assess data=casuser.large_score_data nbins=20;
var p_good;
target good_bad / event="good" level=nominal;
fitstat pvar=p_bad / pevent="bad";
roc / adjustfor=good_bad(event="good") plots; /* ajuster pour la distribution de la cible */
by _PartInd_;
run;
/* Nettoyage du dataset temporaire CAS */
proc cas;
droptable "large_score_data" caslib="CASUSER";
run;
1
/* Configuration CAS (si non déjà configurée) */
2
cas;
3
caslib _all_ assign;
4
5
/* Exemple 4 : Intégration Viya / Grand volume de données */
6
7
/* Création d'un grand jeu de données synthétique directement dans CAS */
8
DATA casuser.large_score_data;
9
DO _PartInd_ = 0 to 1;
10
DO i = 1 to 10000; /* Créer 20,000 observations */
roc / adjustfor=good_bad(event="good") plots; /* ajuster pour la distribution de la cible */
25
BY _PartInd_;
26
RUN;
27
28
/* Nettoyage du dataset temporaire CAS */
29
PROC CAS;
30
droptable "large_score_data" caslib="CASUSER";
31
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.