optNetwork

connectedComponents

Description

The `connectedComponents` action finds the connected components of a graph. In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph. For a directed graph, it finds the weakly connected components. This action is useful for understanding the structure of a network, identifying isolated clusters, or as a preliminary step for other analyses.

optNetwork.connectedComponents { algorithm="AFFOREST" | "AUTOMATIC" | "DFS" | "UNIONFIND", deterministic=TRUE | FALSE, direction="DIRECTED" | "UNDIRECTED", display={...}, distributed=TRUE | FALSE, graph=integer, indexOffset=integer, links={...}, linksVar={...}, logFreqTime=integer, logLevel="AGGRESSIVE" | "BASIC" | "MODERATE" | "NONE", multiLinks=TRUE | FALSE, nodes={...}, nodesVar={...}, nThreads=integer, out={...}, outGraphList={...}, outLinks={...}, outNodes={...}, outputTables={...}, selfLinks=TRUE | FALSE, standardizedLabels=TRUE | FALSE, standardizedLabelsOut=TRUE | FALSE };
Settings
ParameterDescription
algorithm Specifies the algorithm to use for calculating connected components. Options are 'AFFOREST', 'AUTOMATIC', 'DFS', 'UNIONFIND'.
deterministic When set to True, ensures that each invocation (with the same machine configuration and parameter settings) produces the same final result.
direction Specifies whether to consider the input graph as directed or undirected.
display Specifies a list of results tables to send to the client for display.
distributed When set to True, uses a distributed graph.
graph Specifies the in-memory graph to use.
indexOffset Specifies the index offset for identifiers in the log and results output data tables.
links Specifies the input data table that contains the graph link information.
linksVar Specifies the data variable names for the links table.
logFreqTime Controls the frequency in seconds for displaying iteration logs.
logLevel Controls the amount of information that is displayed in the SAS log.
multiLinks When set to True, includes multilinks when an input graph is read.
nodes Specifies the input data table that contains the graph node information.
nodesVar Specifies the data variable names for the nodes table.
nThreads Specifies the maximum number of threads to use for multithreaded processing.
out Specifies the output data table to contain the summary information about the connected components.
outGraphList Specifies the output data table to contain summary information about in-memory graphs.
outLinks Specifies the output data table to contain the graph link information along with any results.
outNodes Specifies the output data table to contain the graph node information along with any results.
outputTables Lists the names of results tables to save as CAS tables on the server.
selfLinks When set to True, includes self-links when an input graph is read.
standardizedLabels When set to True, specifies that the input graph data are in a standardized format.
standardizedLabelsOut When set to True, requests that the output graph data include standardized format.
Data Preparation View data prep sheet
Creating the Input Data

This example creates a simple undirected graph with two separate components. The first component includes nodes A, B, C, D, E, and F. The second component includes nodes G, H, and I.

Copied!
1DATA mycas.LinkSetIn;
2 INPUT from $ to $ @@;
3 DATALINES;
4A B A C B C C D D E D F E F G H H I G I
5;
6RUN;

Examples

This basic example finds the connected components in the `LinkSetIn` graph and stores the component ID for each node in the `mycas.NodeSetOut` table.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 optNetwork.connectedComponents
3 links={name='LinkSetIn'},
4 outNodes={name='mycas.NodeSetOut', replace=true};
5RUN;
6PROC PRINT DATA=mycas.NodeSetOut;
7RUN;
Result :
The `mycas.NodeSetOut` table is created, containing each node and its corresponding component ID. Nodes A, B, C, D, E, F will have one component ID, and nodes G, H, I will have another.

This example demonstrates a more advanced use case. It finds the connected components using the Depth-First Search (DFS) algorithm, which is suitable for both directed and undirected graphs. It generates two output tables: `mycas.NodeSetOut` which maps each node to a component, and `mycas.CompOut`, which provides a summary of each component (e.g., number of nodes and links).

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 optNetwork.connectedComponents
3 links={name='LinkSetIn'},
4 algorithm='DFS',
5 outNodes={name='mycas.NodeSetOut', replace=true},
6 out={name='mycas.CompOut', replace=true};
7RUN;
8PROC PRINT DATA=mycas.NodeSetOut;
9RUN;
10PROC PRINT DATA=mycas.CompOut;
11RUN;
Result :
Two tables are generated. `mycas.NodeSetOut` will show the component ID for each node. `mycas.CompOut` will show summary statistics for each component, for example, one component with 6 nodes and another with 3 nodes.

For very large graphs, processing can be distributed across multiple nodes in the CAS environment. This example shows how to find connected components using the distributed version of the algorithm by setting `distributed=true`. The AFFOREST algorithm is explicitly chosen as it is optimized for distributed, undirected graphs.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 optNetwork.connectedComponents
3 links={name='LinkSetIn'},
4 distributed=true,
5 algorithm='AFFOREST',
6 outNodes={name='mycas.NodeSetOut_dist', replace=true};
7RUN;
8PROC PRINT DATA=mycas.NodeSetOut_dist;
9RUN;
Result :
The output table `mycas.NodeSetOut_dist` is created, containing the mapping of nodes to their component IDs. The results will be the same as the non-distributed example, but the computation is performed in parallel across the CAS cluster, which is more efficient for large-scale graphs.

FAQ

What is the purpose of the connectedComponents action?
What is a connected component in a graph?
How do I specify the input graph for the connectedComponents action?
What algorithms are available to calculate connected components?
How can I find the connected components for a directed graph?
How do I get the component ID for each node in the output?
What does the 'out' output table contain?
Can this action process graphs in a distributed manner?

Associated Scenarios

Use Case
Detection of Money Laundering Rings (Standard Analysis)

A retail bank needs to identify potential money laundering rings. The goal is to detect groups of accounts that frequently transact with each other but are isolated from the res...

Use Case
High-Volume Social Network Clustering (Performance & Distributed)

A social media platform wants to analyze a massive dataset of user interactions to find isolated communities for targeted advertising. The dataset is large, requiring the use of...

Use Case
Supply Chain Resilience (Directed Graph & Edge Cases)

A logistics company is mapping its supply chain to find broken links or isolated subnetworks. They need to treat the graph as DIRECTED (Factory -> Warehouse) and must handle 'di...