Go Cloud-Native: How to Convert and Upload SAS Data to Parquet on Google Cloud

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
Simon

Expert Advice

Simon
Expert SAS et fondateur.

When using the storage_gcs_key_file option, ensure that the JSON key file is physically located on the server where the SAS Compute session is running (or accessible via a shared network path), not just on your local client machine; additionally, the Service Account associated with that key must have the "Storage Object Admin" role (or at least write permissions) on the target bucket to successfully create new Parquet files.

Treat Google Cloud Storage (GCS) as just another local folder.
The LIBNAME statement is used to establish a connection to a Google Cloud Storage bucket by specifying the 'parquet' engine, the 'GCS' storage platform, the bucket name, and the GCS key file path. The COPY procedure is then used to copy the 'sashelp.baseball' dataset to the defined LIBNAME library, thus creating a Parquet table in GCS. Finally, PROC PRINT is used to display the first three observations of the new Parquet table, demonstrating successful access to data stored on Google Cloud Storage.
Data Analysis

Type : CREATION_INTERNE


The example uses the internal SASHELP.BASEBALL dataset. The code provides the complete context for creating the Parquet table and displaying the data.

1 Code Block
LIBNAME / PROC COPY / PROC PRINT Data
Explanation :
This code block first defines a SAS libname, 'mylib', using the 'parquet' engine to connect to a Google Cloud Storage bucket ('my-bucket'). It also specifies the path to the GCS key file for authentication. Then, PROC COPY is used to copy the internal SAS dataset 'sashelp.baseball' to this 'mylib' library, creating a Parquet table in GCS. Finally, PROC PRINT displays the first three observations of the newly created 'baseball' table in 'mylib', thereby verifying the creation and accessibility of the Parquet table.
Copied!
1LIBNAME mylib parquet ""
2 storage_platform = "GCS"
3 storage_bucket_name = "my-bucket"
4 storage_gcs_key_file = "/user/myfiles/my-project-5123b3a258a1.json"
5 ;
6PROC COPY in=sashelp out=mylib;
7 select baseball;
8RUN;
9PROC PRINT DATA=mylib.baseball (obs=3);
10 var name team;
11RUN;
Pro Tip
When defining the LIBNAME, the physical path argument (the empty quotes "" in the example) is required by syntax but ignored when storage_platform="GCS" is used. You can leave it empty safely.
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved