connectedComponents

Q: What is the purpose of the connectedComponents action?

The connectedComponents action calculates the connected components of a graph. It is used to determine groups of nodes where every node is reachable from every other node within the same group.

Q: What is a connected component in a graph?

A connected component of a graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph. For a directed graph, there are weakly and strongly connected components.

Q: How do I specify the input graph for the connectedComponents action?

You can specify the input graph using the 'links' parameter for the link data table and optionally the 'nodes' parameter for the node data table. Alternatively, if the graph is already in memory, you can use the 'graph' parameter.

Q: What algorithms are available to calculate connected components?

The action supports several algorithms specified by the 'algorithm' parameter: 'DFS' (depth-first search), 'UNIONFIND', and 'AFFOREST'. You can also use 'AUTOMATIC', which lets SAS decide the best algorithm based on the graph characteristics.

Q: How can I find the connected components for a directed graph?

To find connected components in a directed graph, set the 'direction' parameter to 'DIRECTED'. The action will then find the strongly connected components.

Q: How do I get the component ID for each node in the output?

Use the 'outNodes' parameter to specify an output table. This table will include the original node information along with a new column, typically named 'concomp', which contains the identifier for the connected component that each node belongs to.

Q: What does the 'out' output table contain?

The 'out' parameter specifies an output data table that contains summary information for each connected component, such as the component ID, the number of nodes, and the number of links in that component.

Q: Can this action process graphs in a distributed manner?

Yes, by setting the 'distributed' parameter to TRUE, the action can process the graph in a distributed fashion across multiple machines, which is useful for very large graphs.

Description

The `connectedComponents` action finds the connected components of a graph. In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph. For a directed graph, it finds the weakly connected components. This action is useful for understanding the structure of a network, identifying isolated clusters, or as a preliminary step for other analyses.

optNetwork.connectedComponents { algorithm="AFFOREST" | "AUTOMATIC" | "DFS" | "UNIONFIND", deterministic=TRUE | FALSE, direction="DIRECTED" | "UNDIRECTED", display={...}, distributed=TRUE | FALSE, graph=integer, indexOffset=integer, links={...}, linksVar={...}, logFreqTime=integer, logLevel="AGGRESSIVE" | "BASIC" | "MODERATE" | "NONE", multiLinks=TRUE | FALSE, nodes={...}, nodesVar={...}, nThreads=integer, out={...}, outGraphList={...}, outLinks={...}, outNodes={...}, outputTables={...}, selfLinks=TRUE | FALSE, standardizedLabels=TRUE | FALSE, standardizedLabelsOut=TRUE | FALSE };

Settings

Parameter	Description
algorithm	Specifies the algorithm to use for calculating connected components. Options are 'AFFOREST', 'AUTOMATIC', 'DFS', 'UNIONFIND'.
deterministic	When set to True, ensures that each invocation (with the same machine configuration and parameter settings) produces the same final result.
direction	Specifies whether to consider the input graph as directed or undirected.
display	Specifies a list of results tables to send to the client for display.
distributed	When set to True, uses a distributed graph.
graph	Specifies the in-memory graph to use.
indexOffset	Specifies the index offset for identifiers in the log and results output data tables.
links	Specifies the input data table that contains the graph link information.
linksVar	Specifies the data variable names for the links table.
logFreqTime	Controls the frequency in seconds for displaying iteration logs.
logLevel	Controls the amount of information that is displayed in the SAS log.
multiLinks	When set to True, includes multilinks when an input graph is read.
nodes	Specifies the input data table that contains the graph node information.
nodesVar	Specifies the data variable names for the nodes table.
nThreads	Specifies the maximum number of threads to use for multithreaded processing.
out	Specifies the output data table to contain the summary information about the connected components.
outGraphList	Specifies the output data table to contain summary information about in-memory graphs.
outLinks	Specifies the output data table to contain the graph link information along with any results.
outNodes	Specifies the output data table to contain the graph node information along with any results.
outputTables	Lists the names of results tables to save as CAS tables on the server.
selfLinks	When set to True, includes self-links when an input graph is read.
standardizedLabels	When set to True, specifies that the input graph data are in a standardized format.
standardizedLabelsOut	When set to True, requests that the output graph data include standardized format.

Data Preparation View data prep sheet

Creating the Input Data

This example creates a simple undirected graph with two separate components. The first component includes nodes A, B, C, D, E, and F. The second component includes nodes G, H, and I.

Copied!

1	DATA mycas.LinkSetIn;
2	INPUT from $ to $ @@;
3	DATALINES;
4	A B A C B C C D D E D F E F G H H I G I
5	;
6	RUN;

Examples

This basic example finds the connected components in the `LinkSetIn` graph and stores the component ID for each node in the `mycas.NodeSetOut` table.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	optNetwork.connectedComponents
3	links={name='LinkSetIn'},
4	outNodes={name='mycas.NodeSetOut', replace=true};
5	RUN;
6	PROC PRINT DATA=mycas.NodeSetOut;
7	RUN;

Result :
The `mycas.NodeSetOut` table is created, containing each node and its corresponding component ID. Nodes A, B, C, D, E, F will have one component ID, and nodes G, H, I will have another.

This example demonstrates a more advanced use case. It finds the connected components using the Depth-First Search (DFS) algorithm, which is suitable for both directed and undirected graphs. It generates two output tables: `mycas.NodeSetOut` which maps each node to a component, and `mycas.CompOut`, which provides a summary of each component (e.g., number of nodes and links).

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	optNetwork.connectedComponents
3	links={name='LinkSetIn'},
4	algorithm='DFS',
5	outNodes={name='mycas.NodeSetOut', replace=true},
6	out={name='mycas.CompOut', replace=true};
7	RUN;
8	PROC PRINT DATA=mycas.NodeSetOut;
9	RUN;
10	PROC PRINT DATA=mycas.CompOut;
11	RUN;

Result :
Two tables are generated. `mycas.NodeSetOut` will show the component ID for each node. `mycas.CompOut` will show summary statistics for each component, for example, one component with 6 nodes and another with 3 nodes.

For very large graphs, processing can be distributed across multiple nodes in the CAS environment. This example shows how to find connected components using the distributed version of the algorithm by setting `distributed=true`. The AFFOREST algorithm is explicitly chosen as it is optimized for distributed, undirected graphs.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	optNetwork.connectedComponents
3	links={name='LinkSetIn'},
4	distributed=true,
5	algorithm='AFFOREST',
6	outNodes={name='mycas.NodeSetOut_dist', replace=true};
7	RUN;
8	PROC PRINT DATA=mycas.NodeSetOut_dist;
9	RUN;

Result :
The output table `mycas.NodeSetOut_dist` is created, containing the mapping of nodes to their component IDs. The results will be the same as the non-distributed example, but the computation is performed in parallel across the CAS cluster, which is more efficient for large-scale graphs.

FAQ

What is the purpose of the connectedComponents action?

What is a connected component in a graph?

How do I specify the input graph for the connectedComponents action?

What algorithms are available to calculate connected components?

How can I find the connected components for a directed graph?

How do I get the component ID for each node in the output?

What does the 'out' output table contain?

Can this action process graphs in a distributed manner?

Associated Scenarios

Use Case

Detection of Money Laundering Rings (Standard Analysis)

A retail bank needs to identify potential money laundering rings. The goal is to detect groups of accounts that frequently transact with each other but are isolated from the res...

View scenario

Use Case

High-Volume Social Network Clustering (Performance & Distributed)

A social media platform wants to analyze a massive dataset of user interactions to find isolated communities for targeted advertising. The dataset is large, requiring the use of...

View scenario

Use Case

Supply Chain Resilience (Directed Graph & Edge Cases)

A logistics company is mapping its supply chain to find broken links or isolated subnetworks. They need to treat the graph as DIRECTED (Factory -> Warehouse) and must handle 'di...

View scenario

Actions associées

optNetwork

biconnectedComponents

The biconnectedComponents action calculates the biconnected components and ar...

optNetwork

clique

The 'clique' action finds maximal cliques in a graph. A clique is a subgraph ...

optNetwork

cycle

The cycle action calculates the elementary cycles (simple circuits) of a grap...

optNetwork

maxFlow

The maxFlow action calculates the maximum flow of a graph. The maximum flow p...

optNetwork

minCostFlow

The minCostFlow action calculates the minimum-cost network flow of a graph. T...

optNetwork

minCut

The minCut action calculates the minimum cut of a graph. A cut is a partition...

Table of Contents

Description

Creating the Input Data

Examples

Finding Connected Components

Detailed Analysis of Connected Components