optNetwork connectedComponents

High-Volume Social Network Clustering (Performance & Distributed)

Scénario de test & Cas d'usage

Business Context

A social media platform wants to analyze a massive dataset of user interactions to find isolated communities for targeted advertising. The dataset is large, requiring the use of distributed processing and optimized algorithms to ensure performance.
About the Set : optNetwork

Network analysis and graph algorithms.

Discover all actions of optNetwork
Data Preparation

Simulation of a large graph with 50,000 random interactions to test performance and algorithm stability.

Copied!
1 
2DATA mycas.social_graph;
3call streaminit(12345);
4DO i = 1 to 50000;
5user_id_1 = int(rand('uniform') * 5000);
6user_id_2 = int(rand('uniform') * 5000);
7IF user_id_1 ne user_id_2 THEN OUTPUT;
8END;
9 
10RUN;
11 

Étapes de réalisation

1
Verify the dataset dimensions.
Copied!
1 
2PROC CAS;
3SIMPLE.numRows TABLE={name='social_graph'};
4 
5RUN;
6 
2
Run connectedComponents using the AFFOREST algorithm, distributed mode, and multithreading.
Copied!
1 
2PROC CAS;
3optNetwork.connectedComponents links={name='social_graph', vars={from='user_id_1', to='user_id_2'}} distributed=true algorithm='AFFOREST' nThreads=8 outNodes={name='user_communities', replace=true};
4 
5RUN;
6 

Expected Result


The execution must complete without memory errors. The 'user_communities' table should be populated with component IDs for all 5,000 users. The log should confirm that the 'AFFOREST' algorithm was used in a distributed manner.