How to extract a "substring" from a numeric variable?
Simon 19 views
Difficulty Level
Débutant
Published on :
Expert Advice
Stéphanie
While mathematical hacks like FLOOR(date/10000) are computationally efficient for simple extraction, the most robust approach is Semantic Conversion. If a number represents a date, convert it to a true SAS date value using INPUT(PUT(var,8.),yymmdd8.). This unlocks the full power of SAS time functions (YEAR, INTNX) and makes your code self-documenting.
Why? Because SUBSTR expects a character string, not a number. Furthermore, trying to compare the result (a string) with a number (2019) creates a type mismatch.
Here are three methods to solve this problem, from the most "mathematical" to the "best practice".
If your column is numeric, the most performant solution is often to stay in the numeric domain. Instead of trying to cut the text, use mathematics.
Imagine a variable date_id containing 20190901. To get the first 4 digits (the year), you just need to divide by 10,000 and keep only the integer part.
proc sql;
select count(receipt_number)
from ma_table
where floor(date_id / 10000) = 2019;
quit;
1
PROC SQL;
2
select count(receipt_number)
3
from ma_table
4
where floor(date_id / 10000) = 2019;
5
QUIT;
Here, 20190901 / 10000 results in 2019.0901. The FLOOR function rounds down to the nearest integer, which is 2019.
If you needed to extract a part from the middle of the number, the logic would become more complex (a mix of subtractions and divisions), making this method less readable for complex extractions.
2. The Conversion Method (Explicit Typing)
If you absolutely want to treat the variable as text, you must explicitly convert it. It's not enough to use PUT; you often have to manage the spaces that can be inserted during the conversion.
However, this method is often more verbose and prone to comparison errors (comparing a string '2019' with the number 2019).
3. The "Best Practice": Convert to a Real SAS Date
proc sql;
/* Création d'une date temporaire pour le filtre */
select count(receipt_number)
from ma_table
where year(input(put(date_id, 8.), yymmdd8.)) = 2019;
quit;
1
PROC SQL;
2
/* Création d'une date temporaire pour le filtre */
3
select count(receipt_number)
4
from ma_table
5
where year(INPUT(put(date_id, 8.), yymmdd8.)) = 2019;
6
QUIT;
Advantages:
Readability: It's immediately clear that we are filtering on the year.
Flexibility: You can easily change to filter by month (MONTH() = 9) or day, without redoing complex mathematical calculations.
Important Disclaimer
The codes and examples provided on WeAreCAS.eu are for educational purposes. It is imperative not to blindly copy-paste them into your production environments. The best approach is to understand the logic before applying it. We strongly recommend testing these scripts in a test environment (Sandbox/Dev). WeAreCAS accepts no responsibility for any impact or data loss on your systems.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.