Proc SQL

How to extract a "substring" from a numeric variable?

Simon 29/04/2021 4 Aufrufe

When starting with SAS© and the SQL procedure (PROC SQL), a common mistake is to try to manipulate numeric variables as if they were character strings.

A classic case is an identifier or date column stored in a numeric format (for example 20190901 for September 1, 2019). If you try to use the SUBSTR function to extract the year "2019" in a WHERE clause, SAS© will return an error.

Why? Because SUBSTR expects a character string, not a number. Furthermore, trying to compare the result (a string) with a number (2019) creates a type mismatch.

Here are three methods to solve this problem, from the most "mathematical" to the "best practice".

Illustration

1. The Mathematical Method (Arithmetic)

If your column is numeric, the most performant solution is often to stay in the numeric domain. Instead of trying to cut the text, use mathematics.

Imagine a variable date_id containing 20190901. To get the first 4 digits (the year), you just need to divide by 10,000 and keep only the integer part.

SAS© Code:

1PROC SQL;
2 select count(receipt_number)
3 from ma_table
4 where floor(date_id / 10000) = 2019;
5QUIT;
Here, 20190901 / 10000 results in 2019.0901. The FLOOR function rounds down to the nearest integer, which is 2019.

If you needed to extract a part from the middle of the number, the logic would become more complex (a mix of subtractions and divisions), making this method less readable for complex extractions.

2. The Conversion Method (Explicit Typing)

If you absolutely want to treat the variable as text, you must explicitly convert it. It's not enough to use PUT; you often have to manage the spaces that can be inserted during the conversion.

However, this method is often more verbose and prone to comparison errors (comparing a string '2019' with the number 2019).

3. The "Best Practice": Convert to a Real SAS Date

If your numeric variable actually represents a date (like 20190901), the most robust and cleanest method is to transform it into a real SAS© date value. Once converted, you can use SAS©'s powerful time functions (YEAR, MONTH, DAY).

To transform a YYYYMMDD number into a SAS© date, we use the combination of INPUT and PUT:

  1. PUT(date_id, 8.): Transforms the number into a character string.

  2. INPUT(..., yymmdd8.): Reads this string to create a valid SAS© date.

SAS© Code:

1PROC SQL;
2 /* Création d'une date temporaire pour le filtre */
3 select count(receipt_number)
4 from ma_table
5 where year(INPUT(put(date_id, 8.), yymmdd8.)) = 2019;
6QUIT;

Advantages:

  • Readability: It's immediately clear that we are filtering on the year.

  • Flexibility: You can easily change to filter by month (MONTH() = 9) or day, without redoing complex mathematical calculations.