It often happens, during data processing on a server (especially under Unix), that we have to manipulate files without knowing their exact name in advance, but rather their position in a sequential list.
Imagine the following scenario: a directory contains about thirty CSV files named sequentially (for example a.txt, b.txt, etc.). Your goal is not to import everything, but to specifically target the 27th file in this list, regardless of its name, to process its data.
Here is a robust method to identify, extract, and read a specific file based on its position, using SAS© system functions.
The 3-Step Strategy
To ensure we correctly select the 27th file in alphabetical order (or sequential), we cannot rely on the simple reading order of the operating system, which can be random.
The procedure to follow is as follows:
List the directory contents: Use SAS© file management functions to read all present file names.
Sort the list: Order the names to ensure sequentiality (A to Z).
Extract the target file: Use direct access (a pointer) to retrieve the name of the file located at position N (here, 27) and store it in a macro variable.
Technical Implementation
Step 1 and 2: Retrieval and Sorting
We will first create a SAS© table (fnames) containing the list of all files in the folder. For this, we use the DOPEN (to open the directory), DNUM (to count the files), and DREAD (to read the names) functions.
Important note: It is crucial to read all files before sorting. If we stop reading at the 27th file found by the
Step 3: Direct Selection with the POINT= Option
Once the list is sorted, we do not need to read the entire table. The SET statement with the POINT= option allows us to go directly to the desired line.
This approach is universal. Whether you are looking for the 27th or the 100th file, the logic remains the same. Once the file name is stored in the macro variable &mon_fichier, you can use it dynamically in an import procedure (like PROC IMPORT) or a classic Data step to read the file's content.