I was hoping somebody might have some suggestions for the following, I had some really great help on here recently with a similar(ish) problem and wanted to expand on it.
I currently have a network built using graphx which looks like the following (only with a much larger number of vertices and edges)
Vertices ID, Attribute1, Attribute2
1001, 2, 0
1002, 1, 0
1003, 2, 1
1004, 3, 2
1006, 4, 0
1007, 5, 1
Edges Source, Destination, Attribute
1001, 1002, 7
1002, 1003, 7
1003, 1004, 7
1004, 1005, 3
1002, 1006, 5
1006, 1007, 5
For each vertex I want to send a message down a chain to each connected component based on the edge attribute and count how many matches there are of the vertex attribute to another vertex attribute along the chain.
So for example: For vertex 1004 the connecting edge attribute is 7, so I want to identify each component which is connected to 1004 by edge attribute 7, in this case it would be 1001->1002->1003->1004, then pattern match the second vertex attribute from 1004 (in this case 2) to any matching first vertex attributes along the chain (in this case it would match with 1003 and 1001, giving me a total count of 2).
I was thinking a solution would be for each vertex:
- Subgraph by all edge properties which connect to it
- Count all matching vertex properties along each of these subgraphs
- Produce a count at the end for each vertex
Any suggestions of how best to achieve this task would be most welcome, or for example would this be possible using something like Pregel?
Correct me if I`m wrong, Do you wanna find all connected components in graph?(Connected components in graph) If it is so, GraphX already has algorithm for such problem - see docs