optNetwork

connectedComponents

Description

The `connectedComponents` action finds the connected components of a graph. In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph. For a directed graph, it finds the weakly connected components. This action is useful for understanding the structure of a network, identifying isolated clusters, or as a preliminary step for other analyses.

optNetwork.connectedComponents { algorithm="AFFOREST" | "AUTOMATIC" | "DFS" | "UNIONFIND", deterministic=TRUE | FALSE, direction="DIRECTED" | "UNDIRECTED", display={...}, distributed=TRUE | FALSE, graph=integer, indexOffset=integer, links={...}, linksVar={...}, logFreqTime=integer, logLevel="AGGRESSIVE" | "BASIC" | "MODERATE" | "NONE", multiLinks=TRUE | FALSE, nodes={...}, nodesVar={...}, nThreads=integer, out={...}, outGraphList={...}, outLinks={...}, outNodes={...}, outputTables={...}, selfLinks=TRUE | FALSE, standardizedLabels=TRUE | FALSE, standardizedLabelsOut=TRUE | FALSE };
Settings
ParameterDescription
algorithmSpecifies the algorithm to use for calculating connected components. Options are 'AFFOREST', 'AUTOMATIC', 'DFS', 'UNIONFIND'.
deterministicWhen set to True, ensures that each invocation (with the same machine configuration and parameter settings) produces the same final result.
directionSpecifies whether to consider the input graph as directed or undirected.
displaySpecifies a list of results tables to send to the client for display.
distributedWhen set to True, uses a distributed graph.
graphSpecifies the in-memory graph to use.
indexOffsetSpecifies the index offset for identifiers in the log and results output data tables.
linksSpecifies the input data table that contains the graph link information.
linksVarSpecifies the data variable names for the links table.
logFreqTimeControls the frequency in seconds for displaying iteration logs.
logLevelControls the amount of information that is displayed in the SAS log.
multiLinksWhen set to True, includes multilinks when an input graph is read.
nodesSpecifies the input data table that contains the graph node information.
nodesVarSpecifies the data variable names for the nodes table.
nThreadsSpecifies the maximum number of threads to use for multithreaded processing.
outSpecifies the output data table to contain the summary information about the connected components.
outGraphListSpecifies the output data table to contain summary information about in-memory graphs.
outLinksSpecifies the output data table to contain the graph link information along with any results.
outNodesSpecifies the output data table to contain the graph node information along with any results.
outputTablesLists the names of results tables to save as CAS tables on the server.
selfLinksWhen set to True, includes self-links when an input graph is read.
standardizedLabelsWhen set to True, specifies that the input graph data are in a standardized format.
standardizedLabelsOutWhen set to True, requests that the output graph data include standardized format.
Data Preparation View data prep sheet
Creating the Input Data

This example creates a simple undirected graph with two separate components. The first component includes nodes A, B, C, D, E, and F. The second component includes nodes G, H, and I.

Copied!
1DATA mycas.LinkSetIn;
2 INPUT from $ to $ @@;
3 DATALINES;
4A B A C B C C D D E D F E F G H H I G I
5;
6RUN;

Examples

This basic example finds the connected components in the `LinkSetIn` graph and stores the component ID for each node in the `mycas.NodeSetOut` table.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 optNetwork.connectedComponents
3 links={name='LinkSetIn'},
4 outNodes={name='mycas.NodeSetOut', replace=true};
5RUN;
6PROC PRINT DATA=mycas.NodeSetOut;
7RUN;
Result :
The `mycas.NodeSetOut` table is created, containing each node and its corresponding component ID. Nodes A, B, C, D, E, F will have one component ID, and nodes G, H, I will have another.

This example demonstrates a more advanced use case. It finds the connected components using the Depth-First Search (DFS) algorithm, which is suitable for both directed and undirected graphs. It generates two output tables: `mycas.NodeSetOut` which maps each node to a component, and `mycas.CompOut`, which provides a summary of each component (e.g., number of nodes and links).

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 optNetwork.connectedComponents
3 links={name='LinkSetIn'},
4 algorithm='DFS',
5 outNodes={name='mycas.NodeSetOut', replace=true},
6 out={name='mycas.CompOut', replace=true};
7RUN;
8PROC PRINT DATA=mycas.NodeSetOut;
9RUN;
10PROC PRINT DATA=mycas.CompOut;
11RUN;
Result :
Two tables are generated. `mycas.NodeSetOut` will show the component ID for each node. `mycas.CompOut` will show summary statistics for each component, for example, one component with 6 nodes and another with 3 nodes.

For very large graphs, processing can be distributed across multiple nodes in the CAS environment. This example shows how to find connected components using the distributed version of the algorithm by setting `distributed=true`. The AFFOREST algorithm is explicitly chosen as it is optimized for distributed, undirected graphs.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 optNetwork.connectedComponents
3 links={name='LinkSetIn'},
4 distributed=true,
5 algorithm='AFFOREST',
6 outNodes={name='mycas.NodeSetOut_dist', replace=true};
7RUN;
8PROC PRINT DATA=mycas.NodeSetOut_dist;
9RUN;
Result :
The output table `mycas.NodeSetOut_dist` is created, containing the mapping of nodes to their component IDs. The results will be the same as the non-distributed example, but the computation is performed in parallel across the CAS cluster, which is more efficient for large-scale graphs.

FAQ

What is the purpose of the connectedComponents action?
What is a connected component in a graph?
How do I specify the input graph for the connectedComponents action?
What algorithms are available to calculate connected components?
How can I find the connected components for a directed graph?
How do I get the component ID for each node in the output?
What does the 'out' output table contain?
Can this action process graphs in a distributed manner?