Published on :
ETL CREATION_INTERNE

Upload a Local CSV File and Apply SAS Format

This code is also available in: Deutsch Español Français
Awaiting validation
The 'upload' action of the 'table action set' in PROC CAS is used to transfer a CSV file from the client's local file system (or the Compute server if executed there) to an in-memory table in CAS. The 'importOptions' option is crucial, as it allows specifying the file type (CSV), indicating whether the first row contains column names ('getNames=True'), and especially applying specific informats and formats to variables via the 'vars' sub-parameter. For dates in 'MM/JJ/AAAA' format, the 'mmddyy10.' informat is used to convert the string to a SAS© numeric value (number of days since January 1, 1960), and the 'nldate20.' format is then applied to display these numeric values in a readable and local manner. Each example includes the creation of a temporary CSV file to ensure autonomy and executability.
Data Analysis

Type : CREATION_INTERNE


Examples use CSV data generated directly in SAS code via a DATA _NULL_ step, simulating the availability of a local CSV file.

1 Code Block
PROC CAS / UPLOAD Data
Explanation :
This example illustrates the upload of a simple CSV file. It first creates a CSV file named 'myusage_basic.csv' with basic data. Then, it uses the `upload` action in `PROC CAS` to load this file into a CAS table named 'myusage_basic'. The `importOptions` specify that the file is a CSV, that the first line contains column names (`getNames=True`), and apply the `mmddyy10.` informat and `nldate20.` format to the date columns (`recorddt`, `startdt`, `enddt`) to interpret them correctly. Finally, `print myusage_basic(obs=3)` displays the first three observations of the loaded table.
Copied!
1/* Création du fichier CSV local */
2DATA _NULL_;
3 FILE "/tmp/myusage_basic.csv" DSD LRECL=200;
4 PUT 'account,high,low,recorddt,startdt,enddt,kwh';
5 PUT '1000001,82,71,07/03/2013,07/01/2013,07/02/2013,54';
6 PUT '1000001,85,70,07/02/2013,06/30/2013,07/01/2013,70';
7 PUT '1000001,85,68,07/01/2013,06/29/2013,06/30/2013,78';
8RUN;
9 
10/* Connexion au serveur CAS (remplacer par vos infos) */
11*options cashost='cloud.example.com' casport=5570;
12cas casauto;
13 
14PROC CAS;
15 SESSION casauto;
16 upload path="/tmp/myusage_basic.csv"
17 casout={name="myusage_basic", replace=True},
18 importOptions={
19 fileType="csv",
20 getNames=True,
21 vars={
22 recorddt={informat="mmddyy10.", FORMAT="nldate20."},
23 startdt={informat="mmddyy10.", FORMAT="nldate20."},
24 enddt={informat="mmddyy10.", FORMAT="nldate20."}
25 }};
26 
27 PRINT myusage_basic(obs=3);
28RUN;
29QUIT;
30 
31/* Suppression du fichier CSV temporaire */
32x "rm /tmp/myusage_basic.csv";
2 Code Block
PROC CAS / UPLOAD Data
Explanation :
This example extends the basic case by adding a 'cost' column and using `promote=True` for the CAS table, thus making it available to all CAS sessions. The CSV file 'myusage_inter.csv' is created with a new 'cost' column. During upload, the `comma8.2` informat and `dollar10.2` format are applied to the 'cost' column to handle monetary values. After the upload, the `table.columnInfo` action is used to check column properties, including the applied informats and formats, and then the complete table is displayed.
Copied!
1/* Création du fichier CSV local avec plus de données */
2DATA _NULL_;
3 FILE "/tmp/myusage_inter.csv" DSD LRECL=200;
4 PUT 'account,high,low,recorddt,startdt,enddt,kwh,cost';
5 PUT '1000001,82,71,07/03/2013,07/01/2013,07/02/2013,54,12.50';
6 PUT '1000001,85,70,07/02/2013,06/30/2013,07/01/2013,70,15.75';
7 PUT '1000002,85,68,07/01/2013,06/29/2013,06/30/2013,78,20.00';
8 PUT '1000002,90,70,06/30/2013,06/28/2013,06/29/2013,72,18.25';
9 PUT '1000003,88,68,06/29/2013,06/27/2013,06/28/2013,54,10.00';
10RUN;
11 
12/* Connexion au serveur CAS (remplacer par vos infos) */
13*options cashost='cloud.example.com' casport=5570;
14cas casauto;
15 
16PROC CAS;
17 SESSION casauto;
18 upload path="/tmp/myusage_inter.csv"
19 casout={name="myusage_inter", replace=True, promote=True},
20 importOptions={
21 fileType="csv",
22 getNames=True,
23 vars={
24 recorddt={informat="mmddyy10.", FORMAT="nldate20."},
25 startdt={informat="mmddyy10.", FORMAT="nldate20."},
26 enddt={informat="mmddyy10.", FORMAT="nldate20."},
27 cost={informat="comma8.2", FORMAT="dollar10.2"}
28 }};
29 
30 TABLE.columnInfo RESULT=r / TABLE={name='myusage_inter'};
31 PRINT r;
32 
33 PRINT myusage_inter;
34RUN;
35QUIT;
36 
37/* Suppression du fichier CSV temporaire */
38x "rm /tmp/myusage_inter.csv";
3 Code Block
PROC CAS / UPLOAD Data
Explanation :
This advanced example uses macro variables (`%LET`) to dynamically define the CSV file name (including system date and time for uniqueness), the CAS table name, and the caslib. Before uploading, it includes conditional logic (`if table.tableInfo(...) then do; table.dropTable...; end;`) to delete an existing CAS table if it has the same name, ensuring clean execution. The CSV file is created and uploaded. New informats/formats (`date9.`, `comma10.2`, `dollar10.2`) are used for various data types. Finally, an `x "rm &CSV_FILE";` command is added to clean up the temporary file on the operating system.
Copied!
1/* Création du fichier CSV local */
2%LET CSV_FILE = /tmp/mydata_advanced_&sysdate._%sysfunc(compress(&systime.)).csv;
3%LET CAS_TABLE = mydata_advanced;
4%LET CAS_LIB = CASUSER;
5 
6DATA _NULL_;
7 FILE "&CSV_FILE" DSD LRECL=200;
8 PUT 'ID,Category,Value,EntryDate';
9 PUT '1,A,123.45,12-JAN-2023';
10 PUT '2,B,678.90,01-FEB-2023';
11 PUT '3,A,111.22,03-MAR-2023';
12 PUT '4,C,333.33,15-APR-2023';
13RUN;
14 
15/* Connexion au serveur CAS (remplacer par vos infos) */
16*options cashost='cloud.example.com' casport=5570;
17cas casauto;
18 
19PROC CAS;
20 SESSION casauto;
21 /* Vérifier si la table existe déjà et la supprimer si nécessaire */
22 IF TABLE.tableInfo(caslib="&CAS_LIB", name="&CAS_TABLE") THEN DO;
23 TABLE.dropTable caslib="&CAS_LIB" name="&CAS_TABLE";
24 END;
25 
26 upload path="&CSV_FILE"
27 casout={name="&CAS_TABLE", caslib="&CAS_LIB", replace=True, promote=True},
28 importOptions={
29 fileType="csv",
30 getNames=True,
31 vars={
32 EntryDate={informat="date9.", FORMAT="nldate20."},
33 Value={informat="comma10.2", FORMAT="dollar10.2"}
34 }};
35 
36 PRINT &CAS_TABLE;
37RUN;
38QUIT;
39 
40/* Supprimer le fichier CSV temporaire après utilisation */
41x "rm &CSV_FILE";
4 Code Block
PROC CAS / UPLOAD, FEDSQL, SAVETABLE Data
Explanation :
This example illustrates a complete workflow in a Viya/CAS environment. It begins by creating a CSV file 'transactions.csv'. Then, the `upload` action loads this data into a CAS table 'raw_transactions', applying informats/formats (`yymmdd10.`, `comma10.2`, `dollar10.2`) for dates and amounts. The next step uses `fedsql.execDirect` to execute an SQL query directly on the CAS in-memory data, aggregating transactions by `CustomerID` to calculate the total number of transactions and the total amount spent. The resulting aggregated table is then persistently saved in the 'CASUSER' caslib as a SASHDA file (`table.save`). Finally, the aggregated table is displayed. This process highlights CAS's ability to manage data loading, processing, and persistence in memory.
Copied!
1/* Création du fichier CSV local */
2DATA _NULL_;
3 FILE "/tmp/transactions.csv" DSD LRECL=200;
4 PUT 'TransactionID,CustomerID,Amount,TransactionDate,ProductCode';
5 PUT '1,C001,150.75,2023-01-15,PROD001';
6 PUT '2,C002,230.00,2023-01-16,PROD002';
7 PUT '3,C001,50.25,2023-01-15,PROD003';
8 PUT '4,C003,100.00,2023-01-17,PROD001';
9 PUT '5,C002,75.50,2023-01-16,PROD004';
10RUN;
11 
12/* Connexion au serveur CAS (remplacer par vos infos) */
13*options cashost='cloud.example.com' casport=5570;
14cas casauto;
15 
16PROC CAS;
17 SESSION casauto;
18 caslib _all_ assign;
19 
20 /* 1. Charger le fichier CSV */
21 upload path="/tmp/transactions.csv"
22 casout={name="raw_transactions", replace=True, promote=True},
23 importOptions={
24 fileType="csv",
25 getNames=True,
26 vars={
27 TransactionDate={informat="yymmdd10.", FORMAT="nldate20."},
28 Amount={informat="comma10.2", FORMAT="dollar10.2"}
29 }};
30 
31 /* 2. Effectuer une agrégation avec FedSQL sur la table CAS */
32 fedsql.execDirect RESULT=res / query="
33 CREATE TABLE aggregated_transactions AS
34 SELECT
35 CustomerID,
36 COUNT(TransactionID) AS TotalTransactions,
37 SUM(Amount) AS TotalAmountSpent
38 FROM raw_transactions
39 GROUP BY CustomerID
40 ORDER BY TotalAmountSpent DESC;
41 ";
42 PRINT res;
43 
44 /* 3. Sauvegarder la table agrégée dans une caslib (ex: CASUSER) */
45 TABLE.save RESULT=saveres /
46 TABLE={name="aggregated_transactions", caslib="CASUSER"},
47 name="aggregated_transactions.sashdat",
48 replace=True;
49 PRINT saveres;
50 
51 /* 4. Afficher les résultats agrégés */
52 PRINT aggregated_transactions;
53RUN;
54QUIT;
55 
56x "rm /tmp/transactions.csv";
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved


Banner
Expert Advice
Expert
Simon
Expert SAS et fondateur.
« For very large local files, ensure your client-to-server network bandwidth is sufficient. If the file is already on a server accessible by CAS, table.loadTable is generally faster than table.upload. »