Published on :
Data Quality CREATION_INTERNE

Basic Usage of PROC DATAMETRICS

This code is also available in: Deutsch Español Français
Awaiting validation
The DATAMETRICS procedure enables fast and efficient data profiling. By simply specifying an input table and an output table, it automatically generates metrics for all present variables. This includes the detection of potential identities, missing values, data formats, and other essential indicators for evaluating data quality. The absence of the 'IDENTITIES' statement means that no specific identification analysis is performed, and default options are used for all calculations.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) to ensure autonomy and reproducibility.

1 Code Block
PROC DATAMETRICS Data
Explanation :
This example shows the simplest use of PROC DATAMETRICS. An input table 'my_data' is created with inline data. The procedure is then executed by specifying only the input table and an output table ('my_results'). By default, it generates quality metrics for all variables in 'my_data'. The 'proc print' statement displays the results for review.
Copied!
1DATA my_data;
2 INPUT ID $ Name $ Age Score;
3 DATALINES;
4 001 John 30 95
5 002 Jane 24 88
6 003 Mike . 72
7 004 Jane 24 88
8 005 Chris 45 60
9 006 John 30 95
10 007 Sarah 29 .
11 ;
12RUN;
13 
14PROC DATAMETRICS DATA=my_data out=my_results;
15RUN;
16 
17PROC PRINT DATA=my_results;
18 title 'Résultats Basiques de PROC DATAMETRICS';
19RUN;
2 Code Block
PROC DATAMETRICS Data
Explanation :
This example uses the 'VARIABLES' clause to specify the columns for which metrics should be generated (EmployeeID, Name, Department, Salary). The 'OUTPUT METRIC=ALL' option is used to request all available metrics. This allows for a more targeted analysis of data quality aspects relevant to the user.
Copied!
1DATA employees;
2 INPUT EmployeeID Name $ Department $ Salary DateOfHire:yymmdd10.;
3 FORMAT DateOfHire yymmdd10.;
4 DATALINES;
5 101 Alice Sales 60000 2020-01-15
6 102 Bob Marketing 75000 2019-03-22
7 103 Alice Sales 60000 2020-01-15
8 104 Charlie IT 80000 2021-07-01
9 105 David Sales 62000 2020-01-15
10 ;
11RUN;
12 
13PROC DATAMETRICS DATA=employees out=employee_metrics;
14 variables EmployeeID Name Department Salary;
15 OUTPUT metric=ALL;
16RUN;
17 
18PROC PRINT DATA=employee_metrics;
19 title 'Métriques de Qualité pour Variables Spécifiques';
20RUN;
3 Code Block
PROC DATAMETRICS Data
Explanation :
This advanced scenario demonstrates the detection of potential identities and the analysis of metrics for specific variables. The 'IDENTITIES' statement is used to identify columns (TransactionID, CustomerID, ProductID) that may contain unique values or significant duplicates. The 'VARIABLES' clause focuses on 'Quantity' and 'Price'. 'OUTPUT METRIC=ALL OUTALL' requests all metrics and adds the calculated metrics to the output table, thereby enriching the original dataset with quality information.
Copied!
1DATA transactions;
2 INPUT TransactionID CustomerID ProductID $ Quantity Price Date:yymmdd10.;
3 FORMAT Date yymmdd10.;
4 DATALINES;
5 T001 C001 P001 2 15.50 2023-01-05
6 T002 C002 P002 1 10.00 2023-01-05
7 T003 C001 P001 2 15.50 2023-01-05
8 T004 C003 P003 3 25.75 2023-01-06
9 T005 C001 P001 2 15.50 2023-01-05
10 ;
11RUN;
12 
13PROC DATAMETRICS DATA=transactions out=transaction_summary;
14 identities TransactionID CustomerID ProductID;
15 variables Quantity Price;
16 OUTPUT metric=ALL outall;
17RUN;
18 
19PROC PRINT DATA=transaction_summary;
20 title 'Analyse Avancée des Métriques et Identités de Transactions';
21RUN;
4 Code Block
PROC DATAMETRICS (CAS) Data
Explanation :
This example adapts the use of PROC DATAMETRICS for a SAS Viya environment with the Cloud Analytic Services (CAS) engine. It begins by establishing a CAS session and loading the 'sales_data' table into CAS's distributed memory ('mycas' library). Then, PROC DATAMETRICS is executed using the CAS table as input and directing the output to a new CAS table. The use of CAS allows for more performant and distributed processing of large volumes of data. The results are then displayed. The CAS session is terminated at the end.
Copied!
1/* Connexion à la session CAS et chargement des données */
2cas;
3LIBNAME mycas cas;
4 
5DATA mycas.sales_data;
6 INPUT SaleID $ Region $ Amount Date:yymmdd10.;
7 FORMAT Date yymmdd10.;
8 DATALINES;
9 S001 East 1200.50 2024-01-10
10 S002 West 850.25 2024-01-11
11 S003 North 1500.00 2024-01-10
12 S004 South 980.75 2024-01-12
13 S005 East . 1000.00 2024-01-13
14 ;
15RUN;
16 
17/* Exécution de PROC DATAMETRICS sur CAS */
18PROC DATAMETRICS DATA=mycas.sales_data out=mycas.sales_metrics;
19 variables SaleID Region Amount Date;
20 OUTPUT metric=ALL;
21RUN;
22 
23/* Affichage des résultats depuis CAS */
24PROC PRINT DATA=mycas.sales_metrics;
25 title 'Métriques de Qualité des Données de Ventes (CAS)';
26RUN;
27 
28/* Arrêt de la session CAS */
29cas term;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.