SAS9

How to select and read the Nth file in a Unix directory with SAS

Simon 14 vues

It often happens, during data processing on a server (especially under Unix), that we have to manipulate files without knowing their exact name in advance, but rather their position in a sequential list.

Imagine the following scenario: a directory contains about thirty CSV files named sequentially (for example a.txt, b.txt, etc.). Your goal is not to import everything, but to specifically target the 27th file in this list, regardless of its name, to process its data.

Here is a robust method to identify, extract, and read a specific file based on its position, using SAS© system functions.

How to select and read the Nth file in a Unix directory with SAS -

The 3-Step Strategy

To ensure we correctly select the 27th file in alphabetical order (or sequential), we cannot rely on the simple reading order of the operating system, which can be random.

The procedure to follow is as follows:

  1. List the directory contents: Use SAS© file management functions to read all present file names.

  2. Sort the list: Order the names to ensure sequentiality (A to Z).

  3. Extract the target file: Use direct access (a pointer) to retrieve the name of the file located at position N (here, 27) and store it in a macro variable.

Technical Implementation

Step 1 and 2: Retrieval and Sorting

We will first create a SAS© table (fnames) containing the list of all files in the folder. For this, we use the DOPEN (to open the directory), DNUM (to count the files), and DREAD (to read the names) functions.

Important note: It is crucial to read all files before sorting. If we stop reading at the 27th file found by the

1DATA fnames;
2 LENGTH dref $8 fname $200;
3 /* 1. Assigner le répertoire à un fileref */
4 rc = filename(dref, "/chemins/vers/mon_dossier");
5
6 /* 2. Ouvrir le répertoire */
7 did = dopen(dref);
8
9 /* 3. Boucler sur tous les fichiers présents */
10 IF did THEN DO i = 1 to dnum(did);
11 fname = dread(did, i);
12 OUTPUT; /* Ajoute le nom à la table */
13 END;
14
15 /* Fermeture propre du répertoire */
16 rc = dclose(did);
17 keep fname;
18RUN;
19 
20/* 4. Trier les fichiers par nom pour garantir l'ordre séquentiel */
21PROC SORT DATA=fnames;
22 BY fname;
23RUN;

Step 3: Direct Selection with the POINT= Option

Once the list is sorted, we do not need to read the entire table. The SET statement with the POINT= option allows us to go directly to the desired line.

1DATA _null_;
2 /* Définir la position cible */
3 pt = 27;
4
5 /* Vérifier si le fichier existe (gestion d'erreur) */
6 IF pt > nobs THEN DO;
7 putlog "Erreur : Pas assez de fichiers dans le dossier.";
8 stop;
9 END;
10 
11 /* Accès direct à la 27ème observation */
12 SET fnames point=pt nobs=nobs;
13
14 /* Stocker le nom dans une macro-variable pour usage ultérieur */
15 call symputx('mon_fichier', fname);
16
17 /* Arrêter l'étape data immédiatement après la lecture */
18 stop;
19RUN;
20 
21/* Vérification dans la log */
22%put Le fichier sélectionné est : &mon_fichier;

This approach is universal. Whether you are looking for the 27th or the 100th file, the logic remains the same. Once the file name is stored in the macro variable &mon_fichier, you can use it dynamically in an import procedure (like PROC IMPORT) or a classic Data step to read the file's content.