pandas dataframe to frozenset based on conditions

1.1k views Asked by At

I have a dataset like:

 node    community
  1         2
  2         4
  3         5
  4         2
  5         3
  7         1
  8         3
  10        4
  12        5

I want to have the frozenset of node column in a way that their community is the same. Thus, the expected result is something like:

 [frozenset([1,4]), frozenset([2,10]), frozenset([3,12]),frozenset([5,8]),frozenset([1])]

Is there any way that I can do it without changing dataframe to a list of list. Thanks.

3

There are 3 answers

2
jpp On BEST ANSWER

Using GroupBy + apply with frozenset:

res = df.groupby('community')['node'].apply(frozenset).values.tolist()

print(res)

[frozenset({7}), frozenset({1, 4}), frozenset({8, 5}),
 frozenset({2, 10}), frozenset({3, 12})]
0
this be Shiva On

I would suggest iterating over your GroupBy object and emitting a map instead.

communities = {k: frozenset(g['node']) for k, g in df.groupby('community')}
print(communities)
{1: frozenset({7}),
 2: frozenset({1, 4}),
 3: frozenset({5, 8}),
 4: frozenset({2, 10}),
 5: frozenset({3, 12})}

Or, if you want a list (you'd lose information on keys), then

communities = [frozenset(g['node']) for _, g in df.groupby('community')]
0
Raisin On

Both of the other answers have worked for me, but speed was a challenge. A solution that was quicker in my case was to combine elements first using sum, and then to convert to frozenset.

df = pd.DataFrame({'mycol': [10,20,30,40,50], 'myindex': [1,1,2,2,3]})
df['mycol_list'] = [[i] for i in df.mycol]
df2 = df.groupby('myindex').mycol_list.sum().to_frame()
df2['mycol_frozenset'] = [frozenset(i) for i in df2.mycol_list]
df2