High Volume Enterprise Knowledge Base Indexing

Business Context

A large engineering firm has a knowledge base with thousands of technical documents. They need to stress-test the autocomplete generation to ensure it can handle a larger volume of similar technical terms (e.g., 'Hydraulic Pump Specification v1', 'Hydraulic Pump Specification v2') without performance degradation.

About the Set : searchAnalytics

Data indexing and search functionalities.

Discover all actions of searchAnalytics

Data Preparation

Generating a larger synthetic dataset with repetitive technical headers to simulate volume.

Copied!

1
2	DATA mycas.tech_docs;
3	LENGTH doc_title $100;
4	DO i=1 to 10000;
5	doc_title = catx(' ', 'Technical Specification Document', 'Version', put(i, 5.), 'for Component', put(mod(i, 10), 2.));
6	OUTPUT;
7	END;
8
9	RUN;
10	PROC CAS;
11	search.buildTermIndex / TABLE={name='tech_docs'} docId='doc_title' casOut={name='tech_terms', replace=true};
12
13	RUN;
14

Étapes de réalisation

Building the auto-complete index on the larger technical term dataset.

Copied!

1
2	PROC CAS;
3	searchAnalytics.buildAutoComplete / index={name='tech_terms'} casOut={name='kb_autocomplete', replace=true};
4
5	RUN;
6

Validating table info to check row count and size.

Copied!

1
2	PROC CAS;
3	TABLE.tableInfo / TABLE={name='kb_autocomplete'};
4
5	RUN;
6

Expected Result

The system creates the 'kb_autocomplete' table efficiently even with a higher cardinality of terms. The tableInfo action confirms the table exists and has a row count consistent with the unique terms generated from the 10,000 documents.

Voir la documentation technique de buildAutoComplete