Published on :
ETL SIMULATED_EXTERNAL_FILE_OR_INTERNAL_CREATION

Loading a CSV file from a server

This code is also available in: Deutsch Español Français
Awaiting validation
The `upload` action allows data to be transferred from a client-side file or a temporary client-side file to a CAS in-memory table. In this example, `PROC HTTP` is first used to retrieve a CSV file from a remote URL and temporarily save it. Then, the `upload` action is invoked via `PROC CAS` to read this temporary file and send it as a CAS table. Options like `casOut` allow specifying the name of the target table in CAS and managing replacement (`replace=True`). The `importOptions={fileType="csv"}` option ensures that the file is correctly interpreted as a CSV. Once the table is loaded, a `simple.summary` action is executed to provide descriptive statistics grouped by a variable, thus demonstrating data processing in the CAS environment.
Data Analysis

Type : SIMULATED_EXTERNAL_FILE_OR_INTERNAL_CREATION


The examples create internal CSV data or simulate external files to ensure their autonomy.

1 Code Block
PROC CAS / DATA step Data
Explanation :
This example shows the direct loading of a small, on-the-fly created CSV file via a `DATA step` in SAS, then transferred to CAS. The `filename` statement with `temp` creates a temporary file, whose path is retrieved by `%sysfunc(pathname(mydata))`. The `table.upload` action is then used to load this file into the CAS session under the name `my_temp_table`. Finally, `table.columnInfo` is used to confirm the loading and structure of the table.
Copied!
1/* Configurez votre hôte et port CAS */
2*options cashost="cloud.example.com" casport=5570;
3*cas casauto;
4 
5/* Crée un fichier CSV temporaire à partir de données inline */
6filename mydata temp;
7DATA _null_;
8 file mydata dsd;
9 put "Name,Age,City";
10 put "Alice,30,Paris";
11 put "Bob,24,Lyon";
12 put "Charlie,35,Marseille";
13RUN;
14 
15/* Récupère le chemin d'accès au fichier temporaire */
16%let temppath = %sysfunc(quote(%sysfunc(pathname(mydata))));
17 
18/* Charge le fichier CSV temporaire dans CAS */
19PROC CAS;
20 upload path=&temppath.
21 casOut={
22 name='my_temp_table',
23 replace=True
24 },
25 importOptions={fileType="csv"};
26RUN;
27 
28/* Vérifie que la table est chargée en affichant ses colonnes */
29PROC CAS;
30 TABLE.columnInfo / TABLE='my_temp_table';
31RUN;
32QUIT;
2 Code Block
PROC CAS / PROC HTTP Data
Explanation :
This example uses `PROC HTTP` to download a CSV file from a remote URL to a temporary file. This file is then loaded into CAS via `table.upload`. The `promote=True` option in `casOut` makes the table globally accessible in the CAS session. To verify the loading, `table.tableInfo` displays the table's metadata, and `simple.summary` calculates basic statistics, illustrating common CAS table management and analysis actions.
Copied!
1/* Configurez votre hôte et port CAS */
2*options cashost="cloud.example.com" casport=5570;
3*cas casauto;
4 
5/* URL d'un fichier CSV public */
6%let data_url='http://support.sas.com/documentation/onlinedoc/viya/exampledatasets/classfit.csv';
7filename temp_csv temp;
8 
9/* Simule le téléchargement d'un fichier CSV depuis une URL */
10PROC HTTP method='get' url=&data_url. out=temp_csv;
11RUN;
12 
13/* Récupère le chemin d'accès au fichier temporaire */
14%let temppath = %sysfunc(quote(%sysfunc(pathname(temp_csv))));
15 
16/* Charge le fichier CSV temporaire dans CAS et le promeut en table globale */
17PROC CAS;
18 upload path=&temppath.
19 casOut={
20 name='class_promote',
21 replace=True,
22 promote=True /* Propage la table pour la rendre globale */
23 },
24 importOptions={fileType="csv"};
25RUN;
26 
27/* Vérifie les informations de la table (action courante) */
28PROC CAS;
29 TABLE.tableInfo / name='class_promote';
30RUN;
31 
32/* Obtient des statistiques sommaires (autre action courante) */
33PROC CAS;
34 t1.name = 'class_promote';
35 SIMPLE.summary /
36 TABLE = t1,
37 subSet = {"N", "MEAN", "STD"};
38QUIT;
3 Code Block
PROC CAS / DATA step Data
Explanation :
This example demonstrates advanced capabilities by loading a CSV file and then using a `DATA step` integrated into `PROC CAS` to manipulate the data. It calculates a new variable, the Body Mass Index (BMI), and filters the dataset to include only male records. Finally, the `simple.summary` action is applied to the resulting table to obtain statistics on the manipulated columns, showing a transformation and analysis in CAS memory.
Copied!
1/* Configurez votre hôte et port CAS */
2*options cashost="cloud.example.com" casport=5570;
3*cas casauto;
4 
5/* Crée un fichier CSV temporaire avec des données de santé */
6filename health_data temp;
7DATA _null_;
8 file health_data dsd;
9 put "ID,Gender,Height_cm,Weight_kg";
10 put "1,M,175,70";
11 put "2,F,160,55";
12 put "3,M,180,85";
13 put "4,F,165,60";
14 put "5,M,170,75";
15RUN;
16 
17/* Récupère le chemin d'accès au fichier temporaire */
18%let temppath = %sysfunc(quote(%sysfunc(pathname(health_data))));
19 
20/* Charge le fichier CSV temporaire dans CAS */
21PROC CAS;
22 upload path=&temppath.
23 casOut={
24 name='health_metrics',
25 replace=True
26 },
27 importOptions={fileType="csv"};
28RUN;
29 
30/* Manipule les données dans CAS : Calcule l'IMC et filtre par genre */
31PROC CAS;
32 DATA casuser.health_bmi / caslib='casuser' replace=True;
33 SET casuser.health_metrics;
34 BMI = Weight_kg / ((Height_cm / 100) ** 2);
35 where Gender = 'M'; /* Filtre pour ne conserver que les hommes */
36 RUN;
37 
38 /* Obtient des statistiques sommaires pour la nouvelle table avec l'IMC */
39 SIMPLE.summary /
40 TABLE='health_bmi',
41 inputs={'Height_cm', 'Weight_kg', 'BMI'},
42 subSet = {"MEAN", "MAX", "MIN"};
43RUN;
44QUIT;
4 Code Block
PROC CAS / DATA step Data
Explanation :
This two-part example first illustrates the system's robustness in the face of errors: an attempt to load a non-existent file is made, which will generate an error in the SAS log, highlighting the need for error handling. The second part demonstrates CAS's ability to handle large volumes of data. A large dummy dataset is generated directly in CAS memory via a `DATA step` (`10,000 observations`), and then grouped summary statistics are calculated. This highlights CAS's performance for distributed processing of large data.
Copied!
1/* --- Partie 1: Démonstration de la gestion d'erreur avec un fichier inexistant --- */
2/* Tente de charger un fichier qui n'existe pas pour montrer une erreur */
3/* Cette opération devrait générer une erreur visible dans le log SAS */
4filename nonexist temp;
5PROC CAS;
6 upload path=%sysfunc(quote(%sysfunc(pathname(nonexist))))
7 casOut={
8 name='error_test',
9 replace=True
10 },
11 importOptions={fileType="csv"};
12RUN; /* Vérifier le log SAS pour les messages d'erreur de fichier introuvable */
13 
14/* --- Partie 2: Simulation de chargement et traitement de gros volumes de données --- */
15/* Création d'une table CAS de grande taille (simulée) */
16PROC CAS;
17 DATA casuser.large_data (drop=_i_) / caslib='casuser' replace=True;
18 DO _i_ = 1 to 10000; /* 10 000 observations simulées */
19 ID = _i_;
20 Category = ceil(rand('UNIFORM') * 5); /* 5 catégories aléatoires */
21 Value = rand('NORMAL') * 100 + 50;
22 OUTPUT;
23 END;
24 RUN;
25 
26 /* Calcul de statistiques sommaires sur la grande table, groupées par catégorie */
27 SIMPLE.summary /
28 TABLE='large_data',
29 inputs={'Value'},
30 groupBy={'Category'},
31 subSet = {"N", "MEAN", "STD"};
32RUN;
33QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Banner
Expert Advice
Expert
Stéphanie
Spécialiste Machine Learning et IA.
« If your CSV file uses a delimiter other than a comma (like a semicolon or pipe), specify it explicitly within importOptions={fileType="csv", delimiter=";"} to avoid data alignment errors. »