I'm trying to construct a simple edge list for an undirected graph containing all possible edges. I used to do it using the cartesian product of the node list by itself & then filtering out duplicated & self edges. This time the size of the input is too large to afford to store unnecessary edges momentarily. Thus, I'm trying to use nested loops to get the needed edges directly from the first time.
Here is the code I wrote:
node_list = ['A', 'B', 'C', 'D']
for i in node_list:
for j in node_list:
if i < j:
source.append(i)
target.append(j)
loop_data = pd.DataFrame({'source': source, 'target':target})
print(loop_data)
The output I get is pretty unexpected. Instead of saving the source & target nodes in their respective lists, the program is keeping both the source & the target nodes in both source & target columns. Here is the current state of the output.
source target
0 A A
1 B B
2 A A
3 C C
4 A A
5 D D
6 B B
7 C C
8 B B
9 D D
10 C C
11 D D
This is the expected form of output (ignore row indexing) :
source target
1 A B
2 A C
3 A D
6 B C
7 B D
11 C D
I can't find where the problem exists. The issue seems to be with appending to both of the source & target lists.
If you initialise each variable as the empty list (
[]) the code works as expected. So I suggest you review how you are initialising thesourceandtargetvariables:Output
Since you want to construct an undirected graph, I suggest you use
itertools.combinationsto generate the edges:Output
The reason for using
combinationsis that it avoids the Python levelfor-loop therefore for larger lists it should be faster. It is also in my opinion more pythonic.Timings
For a
node_listof 676 elements, I get the following timings:where the functions
combinations_approachandoriginal_approachare: