I am learning PySpark in Python. If I use the below line of code to get components from my graph, then one column would be added to my GraphDataFrame with the component (random number). But I am curious is it possible to get a list of nodes that are connected?
g.connectedComponents()
result is just a normal data frame, that you can group by
component
, and then collect results as list using thecollect_list
function (doc). For example, using the example graph from graphframes:will give: