Guide SAS VIYA

SAS Viya Tutorial: Import a Headerless Text File and Rename Columns Dynamically

Simon 23/05/2022 9 vistas

When importing flat files (CSV, TXT) into SAS© Viya via PROC IMPORT, a common scenario poses a problem: the data file does not contain a header row, and the column names are provided separately (for example, in a Word or Excel document).

By default, if you use the GETNAMES=NO option, SAS© generates generic names (VAR1, VAR2, VAR3...).

How can you efficiently apply your own column names without manually renaming each variable, especially if the file contains dozens of them? Here are two approaches: the quick method for a few variables, and complete automation for large files.

1. The Standard Method (PROC IMPORT + RENAME)

If you have few columns, the simplest solution is to import the file as is, then rename the variables afterward.

Step 1: Raw Import

Use the GETNAMES=NO option to tell SAS© that the first line already contains data.

1FILENAME LOC DISK '/workspace/workbench/myorg/data/home/olddatatoimport.txt';
2 
3PROC IMPORT DATAFILE=LOC
4 DBMS=DLM
5 OUT=WORK.NEWDATA
6 REPLACE;
7 DELIMITER=','; /* ou '09'x pour tabulation */
8 GETNAMES=NO; /* Crucial : évite de perdre la 1ère ligne de données */
9RUN;

Step 2: Manual Renaming

Use PROC DATASETS to modify the names without re-reading all the data (very fast even on large volumes).

1PROC DATASETS library=WORK nolist;
2 modify NEWDATA;
3 rename
4 VAR1 = State
5 VAR2 = City
6 VAR3 = Population;
7QUIT;

2. The Automated Method (Dynamic Renaming)

If you have 50 or 100 columns listed in an external document, the manual method is tedious and prone to errors. Here’s how to automate the process by dynamically generating the RENAME statement.

The Concept

  1. Import the data file with generic names (VAR1, VAR2...).

  2. Import the list of column names (copied from your Word/Excel document) into a SAS© table.

  3. Use a Data Step to build a macro variable containing the pair VAR1=NewName VAR2=AnotherName.

  4. Apply this macro in PROC DATASETS.

The Complete Code

Phase A: Create the names table Copy your list of columns from Word and paste it into the code below:

1/* Table contenant vos noms de colonnes dans l'ordre du fichier */
2DATA column_names;
3 LENGTH new_name $32;
4 INPUT new_name;
5 DATALINES;
6State
7City
8ZipCode
9Population
10Revenue
11/* ... collez votre liste ici ... */
12;
13RUN;

Phase B: Generate the renaming script This code loops through the list of names and builds a character string of the type: VAR1=State VAR2=City ...

1DATA _null_;
2 SET column_names END=last;
3
4 LENGTH rename_list $32000;
5 retain rename_list;
6
7 /* On concatène : "VAR" + numéro de ligne = "Nouveau Nom" */
8 /* cats('VAR', _n_) génère VAR1, VAR2, etc. basée sur le numéro de ligne */
9 rename_list = catx(' ', rename_list, catx('=', cats('VAR',_n_), new_name));
10
11 /* À la fin, on stocke tout dans une variable macro &PAIRES_RENAME */
12 IF last THEN call symputx('PAIRES_RENAME', rename_list);
13RUN;
14 
15/* Vérification dans le journal */
16%put &PAIRES_RENAME;

Phase C: Apply the change

1/* Importation des données brutes */
2PROC IMPORT datafile='/chemin/mon_fichier.txt' dbms=dlm out=work.raw_data replace;
3 getnames=no;
4RUN;
5 
6/* Renommage éclair */
7PROC DATASETS library=work nolist;
8 modify raw_data;
9 rename &PAIRES_RENAME;
10QUIT;

Alternative: The DATA Step (Total Control)

If your file is complex (specific dates, particular numeric formats), PROC IMPORT can sometimes guess the data types incorrectly. In this case, the most robust method is to write a classic DATA step.

Instead of letting SAS© guess VAR1, VAR2, you declare your variables directly:

1DATA work.newdata;
2 INFILE '/workspace/workbench/myorg/data/home/olddatatoimport.txt'
3 dsd /* Gère les délimiteurs et les valeurs manquantes */
4 dlm=',' /* Délimiteur */
5 truncover; /* Évite les erreurs si une ligne est trop courte */
6
7 /* Définissez vos noms et types ici */
8 INPUT
9 State :$2.
10 City :$50.
11 Population :best.
12 Revenue :comma10.;
13RUN;
  • For a quick need: PROC IMPORT (GetNames=No) + PROC DATASETS.

  • For a large number of columns: Automation via Macro Variable.

  • For precise control over types: DATA Step with INPUT.