The DATAMETRICS procedure enables fast and efficient data profiling. By simply specifying an input table and an output table, it automatically generates metrics for all present variables. This includes the detection of potential identities, missing values, data formats, and other essential indicators for evaluating data quality. The absence of the 'IDENTITIES' statement means that no specific identification analysis is performed, and default options are used for all calculations.
Data Analysis
Type : CREATION_INTERNE
Examples use generated data (datalines) to ensure autonomy and reproducibility.
1 Code Block
PROC DATAMETRICS Data
Explanation : This example shows the simplest use of PROC DATAMETRICS. An input table 'my_data' is created with inline data. The procedure is then executed by specifying only the input table and an output table ('my_results'). By default, it generates quality metrics for all variables in 'my_data'. The 'proc print' statement displays the results for review.
Copied!
data my_data;
input ID $ Name $ Age Score;
datalines;
001 John 30 95
002 Jane 24 88
003 Mike . 72
004 Jane 24 88
005 Chris 45 60
006 John 30 95
007 Sarah 29 .
;
run;
proc datametrics data=my_data out=my_results;
run;
proc print data=my_results;
title 'Résultats Basiques de PROC DATAMETRICS';
run;
1
DATA my_data;
2
INPUT ID $ Name $ Age Score;
3
DATALINES;
4
001 John 3095
5
002 Jane 2488
6
003 Mike . 72
7
004 Jane 2488
8
005 Chris 4560
9
006 John 3095
10
007 Sarah 29 .
11
;
12
RUN;
13
14
PROC DATAMETRICSDATA=my_data out=my_results;
15
RUN;
16
17
PROC PRINTDATA=my_results;
18
title 'Résultats Basiques de PROC DATAMETRICS';
19
RUN;
2 Code Block
PROC DATAMETRICS Data
Explanation : This example uses the 'VARIABLES' clause to specify the columns for which metrics should be generated (EmployeeID, Name, Department, Salary). The 'OUTPUT METRIC=ALL' option is used to request all available metrics. This allows for a more targeted analysis of data quality aspects relevant to the user.
Copied!
data employees;
input EmployeeID Name $ Department $ Salary DateOfHire:yymmdd10.;
format DateOfHire yymmdd10.;
datalines;
101 Alice Sales 60000 2020-01-15
102 Bob Marketing 75000 2019-03-22
103 Alice Sales 60000 2020-01-15
104 Charlie IT 80000 2021-07-01
105 David Sales 62000 2020-01-15
;
run;
proc datametrics data=employees out=employee_metrics;
variables EmployeeID Name Department Salary;
output metric=ALL;
run;
proc print data=employee_metrics;
title 'Métriques de Qualité pour Variables Spécifiques';
run;
1
DATA employees;
2
INPUT EmployeeID Name $ Department $ Salary DateOfHire:yymmdd10.;
title 'Métriques de Qualité pour Variables Spécifiques';
20
RUN;
3 Code Block
PROC DATAMETRICS Data
Explanation : This advanced scenario demonstrates the detection of potential identities and the analysis of metrics for specific variables. The 'IDENTITIES' statement is used to identify columns (TransactionID, CustomerID, ProductID) that may contain unique values or significant duplicates. The 'VARIABLES' clause focuses on 'Quantity' and 'Price'. 'OUTPUT METRIC=ALL OUTALL' requests all metrics and adds the calculated metrics to the output table, thereby enriching the original dataset with quality information.
title 'Analyse Avancée des Métriques et Identités de Transactions';
21
RUN;
4 Code Block
PROC DATAMETRICS (CAS) Data
Explanation : This example adapts the use of PROC DATAMETRICS for a SAS Viya environment with the Cloud Analytic Services (CAS) engine. It begins by establishing a CAS session and loading the 'sales_data' table into CAS's distributed memory ('mycas' library). Then, PROC DATAMETRICS is executed using the CAS table as input and directing the output to a new CAS table. The use of CAS allows for more performant and distributed processing of large volumes of data. The results are then displayed. The CAS session is terminated at the end.
Copied!
/* Connexion à la session CAS et chargement des données */
cas;
libname mycas cas;
data mycas.sales_data;
input SaleID $ Region $ Amount Date:yymmdd10.;
format Date yymmdd10.;
datalines;
S001 East 1200.50 2024-01-10
S002 West 850.25 2024-01-11
S003 North 1500.00 2024-01-10
S004 South 980.75 2024-01-12
S005 East . 1000.00 2024-01-13
;
run;
/* Exécution de PROC DATAMETRICS sur CAS */
proc datametrics data=mycas.sales_data out=mycas.sales_metrics;
variables SaleID Region Amount Date;
output metric=ALL;
run;
/* Affichage des résultats depuis CAS */
proc print data=mycas.sales_metrics;
title 'Métriques de Qualité des Données de Ventes (CAS)';
run;
/* Arrêt de la session CAS */
cas term;
1
/* Connexion à la session CAS et chargement des données */
title 'Métriques de Qualité des Données de Ventes (CAS)';
26
RUN;
27
28
/* Arrêt de la session CAS */
29
cas term;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.