I struggle to come up with a title that describes what I'm trying to solve, so please comment if you have a better title!
The solution can be in R, Python, or SQL (Aster TeraData SQL to be exact, though a solution any SQL language is very helpful for learning purposes)
The problem: Given a list of pairs of items in no particular order, generate an output that links together all pairs that are related with at least one link.
Here is a simple example using R:
colone = c("a","b","u","e","f","f","j","z")
coltwo = c("b","c","c","a","g","h","h","y")
d <- data.frame(colone, coltwo)
d
colone coltwo
1 a b
2 b c
3 u c
4 e a
5 f g
6 f h
7 j h
8 z y
Desired output (in any easily-readable data structure):
(a,b,c,e,u)
(f,g,h,j)
(y,z)
Essentially, the input is representing a graph of nodes and edges. The desired out is a list of all objects within the graph that are connected.
Any help or thoughts would be appreciated!
In R, you could use the package
igraph
:And you can look at the graph using:
This is based on an excellent answer by MrFlick here