What is the 'tokenizer' parameter and when should I use the 'BASIC' option?

Question

WeAreCAS Team · Accepted Answer

The 'tokenizer' parameter specifies the tokenizer to use. The default is 'STANDARD', which applies a language-specific tokenizer. The 'BASIC' option uses a tokenizer that separates words by white spaces, punctuation, and CJKT characters. The 'BASIC' tokenizer is only available for Chinese, Japanese, and Korean and can enhance rule matching for specific texts in these languages.

compileConcept - What is the 'tokenizer' parameter and when should I use the 'BASIC' option?

Réponse

compileConcept