In clinical or marketing data management, it is common to encounter tables where information is scattered across multiple columns. A common task is to identify which columns contain data for a given individual, and then create a summary list of these column names in a single variable.
This article explains how to use the ARRAY, VNAME, and CATX functions in SAS© to transform a "wide" structure into a descriptive concatenated list.
Let's imagine a dataset containing patient identifiers and several columns corresponding to different studies (Study1ID, Study2ID, etc.). If a patient is enrolled in a study, the column contains an ID. Otherwise, it is empty or missing.
Input Data:
| CurrentStudy | Study1ID | Study2ID | Study3ID | Study4ID |
| 001 | 180 | 780 | . | 560 |
| 002 | . | 790 | . | 520 |
| 003 | . | . | . | 231 |
Desired Result:
We want to create a new variable (e.g., Other_enrolled_studies) that lists the names of the studies the patient is participating in, separated by a comma.
| CurrentStudy | Other_enrolled_studies |
| 001 | Study1ID, Study2ID, Study4ID |
| 002 | Study2ID, Study4ID |
| 003 | Study4ID |
The key to solving this problem lies in the VNAME() function. Unlike most SAS© functions that process the value of a variable, VNAME returns the name of the variable itself.
Combined with a loop over an array (ARRAY), this function allows for dynamic iteration through columns.
Method 1: The Iterative Approach (Recommended)
This method is the most direct and often the most efficient. It consists of:
Declaring an ARRAY that groups all the columns of interest.
Looping over this array.
Checking if the value is not missing.
If a value exists, concatenate the variable name (via VNAME) to our result variable.
Key points of the code:
study:: The colon is a wildcard character that selects all variables starting with the prefix "Study".
catx(", ", ...): This function is ideal because it automatically handles separators. It does not add a comma at the beginning of the string if it is empty, and it removes unnecessary spaces.
Method 2: The Parallel Arrays Approach
Another way to do this is to use two arrays: one for the numeric values and another to store the names as characters, then concatenate everything at the end.
Note on Performance
Although the second method works, it can be less performant on very large datasets if the catx function is placed inside the loop. Indeed, concatenating an entire list in each iteration is an unnecessarily costly operation.
If you use the iterative approach (Method 1), using catx line by line as the conditions are met remains very efficient because only one character string is built progressively.
The combined use of ARRAY and VNAME offers tremendous flexibility for Data Munging. This allows transforming metadata structure (the column names) into usable data (cell values), which is particularly useful for generating automated summary reports.