The 'tokenizer' parameter specifies the tokenizer to use. The default is 'STANDARD', which applies a language-specific tokenizer. The 'BASIC' option uses a tokenizer that separates words by white spaces, punctuation, and CJKT characters. The 'BASIC' tokenizer is only available for Chinese, Japanese,...