SAS9

The IF vs. WHERE Duel and the SUBSTR Function Trap

Simon 12 vistas

Sometimes a SAS© program crashes with an IF statement, but works perfectly if you replace that IF with a WHERE. Is it magic? No, it's an architectural difference.

A forum user encountered a cryptic error: "INVALID THIRD ARGUMENT TO FUNCTION SUBSTR". Let's analyze this classic case to understand SAS©'s internal mechanics.

1. The Guilty Code

The user wants to split their data into two tables based on the province. They are using complex logic involving SUBSTR, TRIM, and LEFT.

1DATA bundrpt regulert;
2 SET combine;
3 /* La condition qui plante */
4 IF (substr(trim(left(brprov)),1,7) = 'ALBERTA') OR
5 (substr(trim(left(brprov)),1,12) = 'SASKATCHEWAN' and put(cidkey,sask.) = 'Y')
6 THEN OUTPUT bundrpt;
7 ELSE OUTPUT regulert;
8RUN;

The Error:

NOTE: INVALID THIRD ARGUMENT TO FUNCTION SUBSTR AT LINE 302 BRPROV=ALBERTA

Why is SAS© complaining about the 3rd argument of SUBSTR (the requested length) when the BRPROV variable is 18 characters long?

2. The Diagnosis: The TRIM Trap

The error doesn't come from the original variable, but from what the code does to it before calling SUBSTR.

Let's look at the function chain for the value "ALBERTA":

  1. LEFT(BRPROV): Aligns the text to the left.

  2. TRIM(...): Removes trailing spaces.

    • Temporary result: The string becomes "ALBERTA" (length = 7).

  3. SUBSTR(..., 1, 12): The code then tries to extract 12 characters to check if it's "SASKATCHEWAN".

The Crash: You are asking SAS© to extract 12 characters from a string that is now only 7 characters long (because of TRIM). This is mathematically impossible, so the Data Step stops.

3. The Key Question: Why Does WHERE Work?

The user notes: "If I use WHERE instead of IF, I don't get an error message."

This is where the fundamental difference lies:

IF (Data Step)

  • When: Executes during the processing of each row, in the PDV (Program Data Vector).

  • Behavior: It manipulates data after transformation. Here, it sees the result of the TRIM (the shortened string) and fails. Furthermore, depending on the logical evaluation rules, SAS© might try to evaluate the second part of the OR even if the first seems sufficient, or simply fail during argument compilation.

WHERE (Pre-Filter)

  • When: Executes before the data enters the Data Step.

  • Behavior: It acts as an SQL filter on the data source.

  • Why does it work? The WHERE statement is often "smarter" or more "restrictive" about the functions it allows. It will often ignore cosmetic functions like TRIM and rely on the source column's metadata (which is indeed 18 characters long), or optimize the query differently.

4. The Solutions (Best Practices)

Rather than relying on the tolerance of WHERE, the code logic must be corrected.

Solution A: Remove TRIM (The Simple Approach)

As a user suggested, the TRIM is unnecessary here. SUBSTR works just fine on a string with spaces, as long as the original variable is long enough.

1/* Pas de TRIM, donc on travaille sur les 18 caractères d'origine */
2IF substr(brprov, 1, 7) = 'ALBERTA' ...

Solution B: The "Starts With" Operator =: (The Pro Approach)

SAS© has a specific operator to say "Starts with," which avoids having to count characters and use SUBSTR.

It's the =: operator (equals sign followed by a colon).

Optimized Code:

1DATA bundrpt regulert;
2 SET combine;
3 /* Plus de SUBSTR, plus de TRIM, plus d'erreurs de longueur ! */
4 IF brprov =: 'ALBERTA' OR
5 (brprov =: 'SASKATCHEWAN' and put(cidkey,sask.) = 'Y')
6 THEN OUTPUT bundrpt;
7 ELSE OUTPUT regulert;
8RUN;
StatementRoleError Handling
IFProgramming logic. Executes line by line on values in memory.Strict. If you create a short string (TRIM) and request a long read (SUBSTR), it crashes.
WHEREInput filter (I/O). Executes on the source file.More robust because it often works on raw metadata or via the SQL engine.

Expert's advice: If you need to check the beginning of a character string, forget SUBSTR. Always use the =: operator. It's more readable, faster, and less prone to length errors.