Data Step

How to retrieve the value of a variable whose name is stored in another column?

Simon 8 Aufrufe

It often happens when handling complex data in SAS© that you run into a dynamic referencing problem. Imagine a dataset where one column does not contain raw data, but the name of another variable in the table from which you want to extract the value.

The Scenario

Let's take the example of a dataset containing statistics (FIELD_AVG, FIELD_SUM) and a control column named REQ_STAT.

This REQ_STAT column indicates which statistic should be retained for the current row.

  • If REQ_STAT is "FIELD_AVG", we want the numeric value from the FIELD_AVG column.

  • If REQ_STAT is "FIELD_SUM", we want the value from FIELD_SUM.

The goal is to create a new variable, FIELD_STAT, which contains this final value.

The False Good Idea: SYMPUT and SYMGET

The first reflex of many programmers is to try to use macro-variable features at run time via CALL SYMPUT and SYMGET.

The code often looks like this:

1DATA target_ds;
2 SET source_ds;
3 /* Tentative d'assigner le nom de la variable à une macro */
4 CALL SYMPUT('field_val', req_stat);
5 /* Tentative de récupérer la valeur */
6 field_stat = SYMGET('field_val');
7RUN;
Why doesn't this work? The result of this code will not be the numeric value (e.g., 120), but the character string itself (e.g., "FIELD_AVG"). The SYMGET function retrieves the content of the macro-variable as is (text) and does not interpret it as a reference to a dataset variable. Moreover, managing macro-variables during the execution of a DATA step is complex because they do not update line by line as one might hope for this type of operation.

The Recommended Solution

To solve this problem of "resolving a character string into a variable name", you must abandon the idea of dynamic resolution by macro during the DATA step.

Two main approaches exist:

1. For a small number of variables: The explicit condition If you only have a few possible columns (as in our example), the simplest and most efficient method remains the use of conditional blocks (IF/THEN/ELSE or SELECT).

1DATA target_ds;
2 SET source_ds;
3
4 SELECT (req_stat);
5 WHEN ('FIELD_AVG') field_stat = field_avg;
6 WHEN ('FIELD_SUM') field_stat = field_sum;
7 OTHERWISE field_stat = .;
8 END;
9RUN;
2. For a large number of variables: Code generation If you are dealing with hundreds of columns, writing conditions manually becomes tedious. The solution is then to generate the SAS© code before execution.

You can use the table's metadata to dynamically build a SELECT statement or a series of IFs via a preprocessing macro. This makes the code generic and adaptable regardless of the table structure.

In SAS©, a character string containing a variable name remains text. To access the content of this variable, it is better to use explicit logical structures or go through arrays or dynamically generated code, rather than relying on SYMGET which is not designed for strict variable indirection.