It often happens, during data processing on a server (especially under Unix), that we have to manipulate files without knowing their exact name in advance, but rather their position in a sequential list.

Imagine the following scenario: a directory contains about thirty CSV files named sequentially (for example a.txt, b.txt, etc.). Your goal is not to import everything, but to specifically target the 27th file in this list, regardless of its name, to process its data.

Here is a robust method to identify, extract, and read a specific file based on its position, using SAS^© system functions.

How to select and read the Nth file in a Unix directory with SAS -

The 3-Step Strategy

To ensure we correctly select the 27th file in alphabetical order (or sequential), we cannot rely on the simple reading order of the operating system, which can be random.

The procedure to follow is as follows:

List the directory contents: Use SAS^© file management functions to read all present file names.
Sort the list: Order the names to ensure sequentiality (A to Z).
Extract the target file: Use direct access (a pointer) to retrieve the name of the file located at position N (here, 27) and store it in a macro variable.

Technical Implementation

Step 1 and 2: Retrieval and Sorting

We will first create a SAS^© table (fnames) containing the list of all files in the folder. For this, we use the DOPEN (to open the directory), DNUM (to count the files), and DREAD (to read the names) functions.

Important note: It is crucial to read all files before sorting. If we stop reading at the 27th file found by the

1	DATA fnames;
2	LENGTH dref $8 fname $200;
3	/* 1. Assigner le répertoire à un fileref */
4	rc = filename(dref, "/chemins/vers/mon_dossier");
5
6	/* 2. Ouvrir le répertoire */
7	did = dopen(dref);
8
9	/* 3. Boucler sur tous les fichiers présents */
10	IF did THEN DO i = 1 to dnum(did);
11	fname = dread(did, i);
12	OUTPUT; /* Ajoute le nom à la table */
13	END;
14
15	/* Fermeture propre du répertoire */
16	rc = dclose(did);
17	keep fname;
18	RUN;
19
20	/* 4. Trier les fichiers par nom pour garantir l'ordre séquentiel */
21	PROC SORT DATA=fnames;
22	BY fname;
23	RUN;

Step 3: Direct Selection with the `POINT=` Option

Once the list is sorted, we do not need to read the entire table. The SET statement with the POINT= option allows us to go directly to the desired line.

1	DATA _null_;
2	/* Définir la position cible */
3	pt = 27;
4
5	/* Vérifier si le fichier existe (gestion d'erreur) */
6	IF pt > nobs THEN DO;
7	putlog "Erreur : Pas assez de fichiers dans le dossier.";
8	stop;
9	END;
10
11	/* Accès direct à la 27ème observation */
12	SET fnames point=pt nobs=nobs;
13
14	/* Stocker le nom dans une macro-variable pour usage ultérieur */
15	call symputx('mon_fichier', fname);
16
17	/* Arrêter l'étape data immédiatement après la lecture */
18	stop;
19	RUN;
20
21	/* Vérification dans la log */
22	%put Le fichier sélectionné est : &mon_fichier;

This approach is universal. Whether you are looking for the 27th or the 100th file, the logic remains the same. Once the file name is stored in the macro variable &mon_fichier, you can use it dynamically in an import procedure (like PROC IMPORT) or a classic Data step to read the file's content.

Aviso importante

Los códigos y ejemplos proporcionados en WeAreCAS.eu son con fines educativos. Es imperativo no copiarlos y pegarlos ciegamente en sus entornos de producción. El mejor enfoque es comprender la lógica antes de aplicarla. Recomendamos encarecidamente probar estos scripts en un entorno de prueba (Sandbox/Dev). WeAreCAS no acepta ninguna responsabilidad por cualquier impacto o pérdida de datos en sus sistemas.

Volver a la lista de artículos

SAS y todos los demás nombres de productos o servicios de SAS Institute Inc. son marcas registradas o marcas comerciales de SAS Institute Inc. en los EE. UU. y otros países. ® indica registro en los EE. UU. WeAreCAS es un sitio comunitario independiente y no está afiliado a SAS Institute Inc.

Niveau de difficulté

Publicado el : 26/08/2022

The 3-Step Strategy

Technical Implementation

Step 1 and 2: Retrieval and Sorting

Step 3: Direct Selection with the POINT= Option

Aviso importante

Step 3: Direct Selection with the `POINT=` Option