Examples use generated data (datalines) or direct creation of CAS tables via DATA steps.
1 Code Block
DATA STEP Data
Explanation : This example initializes two variables, one VARCHAR and one CHAR, with single-byte character strings. The INDEX function searches for the position of the character 'c'. Since the characters are single-byte, the returned positions are identical (3), regardless of the variable type, illustrating standard behavior.
Copied!
libname mycas cas;
data mycas.chaine_basique;
length x varchar(10);
length y $10;
x = 'abcde';
y = 'abcde';
xi = index(x,'c');
yi = index(y,'c');
put 'VARC_pos_c = ' xi;
put 'CHAR_pos_c = ' yi;
run;
proc print data=mycas.chaine_basique;
title 'Résultats de l''indexation basique';
run;
proc casutil incaslib='mycas' outcaslib='mycas';
dropcas casdata='chaine_basique' quiet;
quit;
1
LIBNAME mycas cas;
2
3
DATA mycas.chaine_basique;
4
LENGTH x varchar(10);
5
LENGTH y $10;
6
x = 'abcde';
7
y = 'abcde';
8
xi = index(x,'c');
9
yi = index(y,'c');
10
put 'VARC_pos_c = ' xi;
11
put 'CHAR_pos_c = ' yi;
12
RUN;
13
14
PROC PRINTDATA=mycas.chaine_basique;
15
title 'Résultats de l''indexation basique';
16
RUN;
17
18
PROC CASUTIL incaslib='mycas' outcaslib='mycas';
19
dropcas casdata='chaine_basique' quiet;
20
QUIT;
2 Code Block
DATA STEP Data
Explanation : This example uses multi-byte Chinese characters. The VARCHAR(10) variable stores '你好世界' as 4 characters. Searching for '世' (the third character) returns 3 for VARCHAR. The CHAR(10) variable stores the same string, but each character occupies 3 bytes. Searching for '世' (the 7th byte if counting from 1) returns 7 for CHAR, clearly demonstrating the difference between character-length (VARCHAR) and byte-length (CHAR) semantics.
Copied!
libname mycas cas;
data mycas.chaine_multioctet;
length x varchar(10);
length y $10;
x = '你好世界'; /* "Bonjour monde" en chinois, 4 caractères, 12 octets */
y = '你好世界';
xi = index(x,'世'); /* Recherche du 3ème caractère */
yi = index(y,'世');
put 'VARCHAR_pos_shi = ' xi;
put 'CHAR_pos_shi = ' yi;
run;
proc print data=mycas.chaine_multioctet;
title 'Résultats de l''indexation multi-octets';
run;
proc casutil incaslib='mycas' outcaslib='mycas';
dropcas casdata='chaine_multioctet' quiet;
quit;
1
LIBNAME mycas cas;
2
3
DATA mycas.chaine_multioctet;
4
LENGTH x varchar(10);
5
LENGTH y $10;
6
x = '你好世界'; /* "Bonjour monde" en chinois, 4 caractères, 12 octets */
7
y = '你好世界';
8
xi = index(x,'世'); /* Recherche du 3ème caractère */
9
yi = index(y,'世');
10
put 'VARCHAR_pos_shi = ' xi;
11
put 'CHAR_pos_shi = ' yi;
12
RUN;
13
14
PROC PRINTDATA=mycas.chaine_multioctet;
15
title 'Résultats de l''indexation multi-octets';
16
RUN;
17
18
PROC CASUTIL incaslib='mycas' outcaslib='mycas';
19
dropcas casdata='chaine_multioctet' quiet;
20
QUIT;
3 Code Block
DATA STEP Data
Explanation : This example delves into the use of INDEX by searching for a longer substring ('monde') in CHAR and VARCHAR variables. It shows that for single-byte characters, the behavior is the same. Furthermore, it illustrates what happens when the searched substring is not found (the INDEX function returns 0), a common case in string manipulation.
Copied!
libname mycas cas;
data mycas.chaine_avancee;
length phrase_varchar varchar(50);
length phrase_char $50;
phrase_varchar = 'Le monde est beau, la vie est courte.';
phrase_char = 'Le monde est beau, la vie est courte.';
pos_monde_varchar = index(phrase_varchar,'monde');
pos_monde_char = index(phrase_char,'monde');
pos_non_trouve_varchar = index(phrase_varchar,'inexistant');
pos_non_trouve_char = index(phrase_char,'inexistant');
put 'VARCHAR "monde" à la position : ' pos_monde_varchar;
put 'CHAR "monde" à la position : ' pos_monde_char;
put 'VARCHAR "inexistant" à la position : ' pos_non_trouve_varchar;
put 'CHAR "inexistant" à la position : ' pos_non_trouve_char;
run;
proc print data=mycas.chaine_avancee;
title 'Résultats de recherche avancée de sous-chaînes';
run;
proc casutil incaslib='mycas' outcaslib='mycas';
dropcas casdata='chaine_avancee' quiet;
quit;
1
LIBNAME mycas cas;
2
3
DATA mycas.chaine_avancee;
4
LENGTH phrase_varchar varchar(50);
5
LENGTH phrase_char $50;
6
phrase_varchar = 'Le monde est beau, la vie est courte.';
7
phrase_char = 'Le monde est beau, la vie est courte.';
put 'VARCHAR "monde" à la position : ' pos_monde_varchar;
16
put 'CHAR "monde" à la position : ' pos_monde_char;
17
put 'VARCHAR "inexistant" à la position : ' pos_non_trouve_varchar;
18
put 'CHAR "inexistant" à la position : ' pos_non_trouve_char;
19
RUN;
20
21
PROC PRINTDATA=mycas.chaine_avancee;
22
title 'Résultats de recherche avancée de sous-chaînes';
23
RUN;
24
25
PROC CASUTIL incaslib='mycas' outcaslib='mycas';
26
dropcas casdata='chaine_avancee' quiet;
27
QUIT;
4 Code Block
DATA STEP Data
Explanation : This example highlights length semantics using the SUBSTR function in a CAS environment, a key element of Viya. It uses multi-byte characters and shows that SUBSTR on VARCHAR extracts characters based on their logical position (by character), while on CHAR, it extracts bytes. This can lead to unexpected results if the difference is not understood, especially if one attempts to extract parts of multi-byte characters with a CHAR variable. It can also be a source of error if the indicated position is in the middle of a multi-byte character for a CHAR variable, or beyond the defined size if semantics are not taken into account.
Copied!
libname mycas cas;
data mycas.chaine_substr_cas;
length var_char $10;
length var_varchar varchar(10);
/* Chaîne de 3 caractères multi-octets (ex: chinois) */
var_char = '你好世'; /* 3 caractères, 9 octets */
var_varchar = '你好世';
/* Extraction du 2ème caractère (VARCHAR) vs 2ème octet (CHAR) */
sub_varchar_char = substr(var_varchar, 2, 1);
sub_char_byte = substr(var_char, 2, 1);
/* Tentative d'extraction d'un caractère au-delà de la longueur réelle par octet pour CHAR */
sub_char_byte_erreur = substr(var_char, 7, 1); /* Le 7ème octet est le 3ème caractère */
put 'VARCHAR (caractère 2) : ' sub_varchar_char;
put 'CHAR (octet 2) : ' sub_char_byte;
put 'CHAR (octet 7, 3ème caractère) : ' sub_char_byte_erreur;
run;
proc print data=mycas.chaine_substr_cas;
title 'Comparaison SUBSTR avec CHAR et VARCHAR en CAS';
run;
proc casutil incaslib='mycas' outcaslib='mycas';
dropcas casdata='chaine_substr_cas' quiet;
quit;
1
LIBNAME mycas cas;
2
3
DATA mycas.chaine_substr_cas;
4
LENGTH var_char $10;
5
LENGTH var_varchar varchar(10);
6
7
/* Chaîne de 3 caractères multi-octets (ex: chinois) */
8
var_char = '你好世'; /* 3 caractères, 9 octets */
9
var_varchar = '你好世';
10
11
/* Extraction du 2ème caractère (VARCHAR) vs 2ème octet (CHAR) */
12
sub_varchar_char = substr(var_varchar, 2, 1);
13
sub_char_byte = substr(var_char, 2, 1);
14
15
/* Tentative d'extraction d'un caractère au-delà de la longueur réelle par octet pour CHAR */
16
sub_char_byte_erreur = substr(var_char, 7, 1); /* Le 7ème octet est le 3ème caractère */
17
18
put 'VARCHAR (caractère 2) : ' sub_varchar_char;
19
put 'CHAR (octet 2) : ' sub_char_byte;
20
put 'CHAR (octet 7, 3ème caractère) : ' sub_char_byte_erreur;
21
RUN;
22
23
PROC PRINTDATA=mycas.chaine_substr_cas;
24
title 'Comparaison SUBSTR avec CHAR et VARCHAR en CAS';
25
RUN;
26
27
PROC CASUTIL incaslib='mycas' outcaslib='mycas';
28
dropcas casdata='chaine_substr_cas' quiet;
29
QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
« When migrating legacy SAS code to Viya, review all INDEX, SCAN, and SUBSTR calls. If you convert your table columns to VARCHAR during the load to CAS, your standard string functions will become "encoding-aware" automatically, often solving truncation and indexing bugs without changing a single line of logic. »
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.