The examples use internally generated data via DATA-Steps with DATALINES, ensuring their autonomy and reproducibility.
1 Code Block
DATA / PROC DATAMETRICS Data
Explanation : This example initializes a 'my_data' dataset with fictional information (name, address, city, state). Then, the PROC DATAMETRICS procedure is executed with the minimally required parameters to calculate data quality metrics for the 'name' and 'address' variables. The results are saved in the 'basic_metrics' table.
Copied!
data work.my_data;
length name $30 address $50 city $20 state $2;
input name $ address $ city $ state $;
datalines;
"John Doe" "123 Main St" "Anytown" "NY"
"Jane Smith" "456 Oak Ave" "Anycity" "CA"
"John Doe" "123 Main St" "Anytown" "NY"
"Peter Jones" "789 Pine Ln" "Otherville" "TX"
"Alice Brown" "101 Maple Dr" "Anytown" "NY"
"Bob White" "202 Elm St" "Otherville" "TX"
"Charlie Green" "303 Cedar Rd" "Anycity" "CA"
"David Black" "404 Birch Ct" "Anytown" "NY"
;
run;
proc datametrics data=work.my_data out=work.basic_metrics;
variables name address;
run;
proc print data=work.basic_metrics;
title "Grundlegende Metriken für Name und Adresse";
run;
title "Grundlegende Metriken für Name und Adresse";
22
RUN;
2 Code Block
PROC DATAMETRICS
Explanation : Based on the data from the previous example, this example uses common options: 'frequencies=10' for the 10 most frequent values, 'minmax=5' for 5 minimum and maximum values, and 'median' to calculate the median. The 'identities' statement is used to integrate a specific Quality Knowledge Base (QKB) for the 'ENUSA' locale and the 'Field Content' definition to enhance identity analysis.
Copied!
/* Assurez-vous que work.my_data est déjà créé à partir de l'Exemple 1 */
proc datametrics data=work.my_data out=work.common_metrics frequencies=10
minmax=5 median;
identities qkb='/sas/dqc/QKBLoc' locale='ENUSA' def='Field Content';
variables name address city;
run;
proc print data=work.common_metrics;
title "Metriken mit Häufigkeiten, Min/Max, Median und QKB";
run;
1
/* Assurez-vous que work.my_data est déjà créé à partir de l'Exemple 1 */
title "Metriken mit Häufigkeiten, Min/Max, Median und QKB";
11
RUN;
3 Code Block
PROC FORMAT / DATA / PROC DATAMETRICS
Explanation : This example introduces a custom format for the 'state' variable and then applies this format to a new 'formatted_data' dataset. PROC DATAMETRICS is subsequently executed on this formatted table. The 'frequencies', 'minmax', and 'threads=4' options are used for parallel processing. The 'multiidentity' option in the 'identities' statement allows for the analysis of multiple data quality identities for the specified variables.
Copied!
/* Assurez-vous que work.my_data est déjà créé à partir de l'Exemple 1 */
proc format;
value $statefmt
'NY'='New York'
'CA'='California'
'TX'='Texas'
other='Autre';
run;
data work.formatted_data;
set work.my_data;
format state $statefmt.;
run;
proc datametrics data=work.formatted_data out=work.advanced_metrics
frequencies=20 minmax=10 threads=4 format;
identities qkb='/sas/dqc/QKBLoc' locale='ENUSA'
def='Field Content' multiidentity;
variables name address city state;
run;
proc print data=work.advanced_metrics;
title "Erweiterte Metriken mit Formaten, Threads und Multi-Identitäten";
run;
1
/* Assurez-vous que work.my_data est déjà créé à partir de l'Exemple 1 */
title "Erweiterte Metriken mit Formaten, Threads und Multi-Identitäten";
25
RUN;
4 Code Block
CASLIB / PROC CASUTIL / PROC DATAMETRICS
Explanation : This example demonstrates integration with the SAS Viya Cloud Analytic Services (CAS) environment. The 'my_data' dataset is first loaded into a CAS library ('casuser.my_cas_data') using PROC CASUTIL. Subsequently, PROC DATAMETRICS is executed directly on the in-memory CAS table. Options like 'frequencies', 'minmax', and 'threads' are applied to optimize metric analysis in a distributed environment. The results are also stored in a CAS table.
Copied!
/* Assurez-vous que work.my_data est déjà créé à partir de l'Exemple 1 */
caslib _all_ assign;
proc casutil;
load data=work.my_data outcaslib='casuser' casout='my_cas_data' replace;
run;
proc datametrics data=casuser.my_cas_data out=casuser.cas_metrics
frequencies=5 minmax=3 threads=2;
identities qkb='/sas/dqc/QKBLoc' locale='ENUSA' def='Field Content';
variables name address city;
run;
proc print data=casuser.cas_metrics;
title "Metriken über DATAMETRICS auf CAS";
run;
1
/* Assurez-vous que work.my_data est déjà créé à partir de l'Exemple 1 */
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.