Published on :
ETL CREATION_INTERNE

Examples: Using a SAS Engine to Process SAS Data

This code is also available in: Deutsch Español
Awaiting validation
Detailed functional analysis explaining the key concepts of each SAS© engine and their application. The V9 engine is the default base SAS© engine, the SPD engine is optimized for large tables and distributed environments (like Hadoop), the CVP engine helps avoid truncation of character variables during transcoding, and the CAS engine allows loading and processing data in memory on a Cloud Analytic Services server.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or SASHELP.

1 Code Block
DATA STEP Data
Explanation :
The LIBNAME statement assigns the myfiles libref and the V9 engine to a library location. Replace 'library-path' with your library's location. The location must exist and be accessible by the SAS compute server. The DATA step creates the myclass dataset in the myfiles library by copying the class dataset from the sashelp library.
Copied!
1LIBNAME myfiles v9 'library-path';
2DATA myfiles.myclass;
3 SET sashelp.class;
4RUN;
2 Code Block
LIBNAME
Explanation :
This part of the LIBNAME statement assigns the mylib libref and the SPD engine to a primary path name. The first (and usually the only) metadata file for a data set is always stored in the primary path for the library. You can optionally assign one or more path names in the DATAPATH= option to store data partitions. Otherwise, the data partition files are stored in the primary path. You can optionally assign one or more path names in the INDEXPATH= option to store index files. Otherwise, the index files are stored in the primary path.
Copied!
1LIBNAME mylib spde 'library-path'
2datapath=('path-for-
3data-partitions')
4indexpath=('path-for-indexes');
5 
3 Code Block
LIBNAME
Explanation :
The SET= system option sets environment variables for Hadoop. If these environment variables are already set (for example, during configuration), do not submit these lines of code. If these environment variables are not correctly set, the LIBNAME statement produces errors in the SAS log. The LIBNAME statement assigns the mydata libref to the SPD engine and to a directory in the Hadoop cluster. The HDFS=YES argument specifies the connection to the Hadoop cluster defined in the Hadoop cluster configuration files. The ACCELWHERE=YES option requests that data subsetting be performed by a MapReduce program in the Hadoop cluster.
Copied!
1options SET=SAS_HADOOP_CONFIG_PATH='/myconfigpath';
2options SET=SAS_HADOOP_JAR_PATH='/myjarpath';
3 
4LIBNAME mydata spde '/data/abcdef' hdfs=yes accelwhere=yes;
4 Code Block
PROC COPY
Explanation :
This LIBNAME statement assigns the srclib library to the CVP engine and to the location of the data that you want to copy. The CVPENGINE= option specifies the V9 engine as the underlying engine for processing the data. The CVPMULT= option specifies a multiplication factor of 2.5 to expand all character variables. If this option is not specified, the CVP engine automatically chooses a multiplying value. This LIBNAME statement assigns the target library to hold the copied data. The COPY procedure with the SELECT statement copies the myclass dataset to the target library. During copying, the CVP engine expands the lengths of the character variables by 2.5 times: For 'Name', 8 × 2.5 = 20. For 'Sex', 1 × 2.5 = 2.5, which rounds up to 3.
Copied!
1LIBNAME srclib cvp 'library-path-1' cvpengine=v9 cvpmult=2.5;
2LIBNAME target v9 'library-path-2';
3PROC COPY in=srclib out=target;
4 select myclass;
5RUN;
6 
7PROC CONTENTS DATA=target.myclass;
8RUN;
5 Code Block
DATA STEP / PROC CONTENTS Data
Explanation :
The CAS statement starts a CAS session and specifies casauto as the CAS session name. Use your connection information in the HOST= and PORT= options. The LIBNAME statement assigns the mycas libref to the CAS engine. The LIBNAME SESSREF= option is not specified, so the engine uses the casauto session. The DATA step copies the SAS dataset sashelp.cars to the CAS session. The PROMOTE=YES dataset option promotes the table with global scope. The CONTENTS procedure shows that the mycas.cars table is available on the CAS server for the duration of the session. Once the data is loaded into memory, subsequent steps can process the in-memory data. Loading and processing are done in separate steps.
Copied!
1cas casauto host="cloud.example.com" port=5570;
2 
3LIBNAME mycas cas;
4DATA mycas.cars (promote=yes);
5 SET sashelp.cars;
6RUN;
7PROC CONTENTS DATA=mycas.cars;
8RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved