The `connectedComponents` action finds the connected components of a graph. In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph. For a directed graph, it finds the weakly connected components. This action is useful for understanding the structure of a network, identifying isolated clusters, or as a preliminary step for other analyses.
| Parameter | Description |
|---|---|
| algorithm | Specifies the algorithm to use for calculating connected components. Options are 'AFFOREST', 'AUTOMATIC', 'DFS', 'UNIONFIND'. |
| deterministic | When set to True, ensures that each invocation (with the same machine configuration and parameter settings) produces the same final result. |
| direction | Specifies whether to consider the input graph as directed or undirected. |
| display | Specifies a list of results tables to send to the client for display. |
| distributed | When set to True, uses a distributed graph. |
| graph | Specifies the in-memory graph to use. |
| indexOffset | Specifies the index offset for identifiers in the log and results output data tables. |
| links | Specifies the input data table that contains the graph link information. |
| linksVar | Specifies the data variable names for the links table. |
| logFreqTime | Controls the frequency in seconds for displaying iteration logs. |
| logLevel | Controls the amount of information that is displayed in the SAS log. |
| multiLinks | When set to True, includes multilinks when an input graph is read. |
| nodes | Specifies the input data table that contains the graph node information. |
| nodesVar | Specifies the data variable names for the nodes table. |
| nThreads | Specifies the maximum number of threads to use for multithreaded processing. |
| out | Specifies the output data table to contain the summary information about the connected components. |
| outGraphList | Specifies the output data table to contain summary information about in-memory graphs. |
| outLinks | Specifies the output data table to contain the graph link information along with any results. |
| outNodes | Specifies the output data table to contain the graph node information along with any results. |
| outputTables | Lists the names of results tables to save as CAS tables on the server. |
| selfLinks | When set to True, includes self-links when an input graph is read. |
| standardizedLabels | When set to True, specifies that the input graph data are in a standardized format. |
| standardizedLabelsOut | When set to True, requests that the output graph data include standardized format. |
This example creates a simple undirected graph with two separate components. The first component includes nodes A, B, C, D, E, and F. The second component includes nodes G, H, and I.
| 1 | DATA mycas.LinkSetIn; |
| 2 | INPUT from $ to $ @@; |
| 3 | DATALINES; |
| 4 | A B A C B C C D D E D F E F G H H I G I |
| 5 | ; |
| 6 | RUN; |
This basic example finds the connected components in the `LinkSetIn` graph and stores the component ID for each node in the `mycas.NodeSetOut` table.
| 1 | PROC CAS; |
| 2 | optNetwork.connectedComponents |
| 3 | links={name='LinkSetIn'}, |
| 4 | outNodes={name='mycas.NodeSetOut', replace=true}; |
| 5 | RUN; |
| 6 | PROC PRINT DATA=mycas.NodeSetOut; |
| 7 | RUN; |
This example demonstrates a more advanced use case. It finds the connected components using the Depth-First Search (DFS) algorithm, which is suitable for both directed and undirected graphs. It generates two output tables: `mycas.NodeSetOut` which maps each node to a component, and `mycas.CompOut`, which provides a summary of each component (e.g., number of nodes and links).
| 1 | PROC CAS; |
| 2 | optNetwork.connectedComponents |
| 3 | links={name='LinkSetIn'}, |
| 4 | algorithm='DFS', |
| 5 | outNodes={name='mycas.NodeSetOut', replace=true}, |
| 6 | out={name='mycas.CompOut', replace=true}; |
| 7 | RUN; |
| 8 | PROC PRINT DATA=mycas.NodeSetOut; |
| 9 | RUN; |
| 10 | PROC PRINT DATA=mycas.CompOut; |
| 11 | RUN; |
For very large graphs, processing can be distributed across multiple nodes in the CAS environment. This example shows how to find connected components using the distributed version of the algorithm by setting `distributed=true`. The AFFOREST algorithm is explicitly chosen as it is optimized for distributed, undirected graphs.
| 1 | PROC CAS; |
| 2 | optNetwork.connectedComponents |
| 3 | links={name='LinkSetIn'}, |
| 4 | distributed=true, |
| 5 | algorithm='AFFOREST', |
| 6 | outNodes={name='mycas.NodeSetOut_dist', replace=true}; |
| 7 | RUN; |
| 8 | PROC PRINT DATA=mycas.NodeSetOut_dist; |
| 9 | RUN; |