searchAnalytics buildAutoComplete

Edge Case: Social Media Hashtags and Special Characters

Scénario de test & Cas d'usage

Business Context

A marketing agency tracks trending topics on social media. User-generated content is 'dirty', containing hashtags, special characters, mixed case, and excessive whitespace. The test ensures the action processes these irregularities without failing.
About the Set : searchAnalytics

Data indexing and search functionalities.

Discover all actions of searchAnalytics
Data Preparation

Creating a dataset with mixed case, special characters (#, @), and null values.

Copied!
1DATA mycas.social_posts; LENGTH post_content $100; INFILE DATALINES truncover; INPUT post_content $ &; DATALINES;
2#SummerVibes #2025
3!!! BREAKING NEWS !!!
4@user_handle check this out
5 leading spaces test
6 
7NOTE: The line above was empty
8; RUN;
9 
10PROC CAS; search.buildTermIndex / TABLE={name='social_posts'} docId='post_content' casOut={name='social_terms', replace=true}; RUN;

Étapes de réalisation

1
Attempting to build auto-complete index from irregular terms.
Copied!
1 
2PROC CAS;
3searchAnalytics.buildAutoComplete / index={name='social_terms'} casOut={name='social_autocomplete', replace=true};
4 
5RUN;
6 
2
Saving the resulting index to a specific caslib (Testing casOut flexibility with specific path/lib).
Copied!
1 
2PROC CAS;
3TABLE.promote / name='social_autocomplete' targetLib='casuser';
4 
5RUN;
6 

Expected Result


The action completes without error. Special characters are either indexed (if supported by the underlying engine) or stripped, but the process does not crash. The output table 'social_autocomplete' is created, demonstrating the action's robustness against dirty input data.