How to handle unordered sets of tags in Python Pandas Dataframe in terms of merging

Question

How to handle unordered sets of tags in Python Pandas Dataframe in terms of merging

70 views Asked by Melon Pie At 09 November 2023 at 11:09

I have two Dataframes which contain tags in one of the columns. Tags can be present or absent in the list and are stored in an unordered manner. I would like to get the merge of the two datasets with rows that have the same tags. In plain python I can store them as the set datatype.

tags1 = {"tag1","tag2"}
tags2 = {"tag2","tag1"}
tags1 == tags2
True

However I cant do the same in Pandas since sets are not hashable.

df1 = pd.DataFrame([{'name': 'foo', 'tags': {"tag1","tag2"}}, {'name': 'bar', 'tags': {"tag3","tag4"}}])
df2 = pd.DataFrame([{'name': 'baz', 'tags': {"tag4","tag3"}}, {'name': 'qux', 'tags': {"tag5","tag6"}}])
df1.merge(df2, how="inner", indicator=True, on="tags")

What I would like to see

     name_x  value           name_y _merge
0    bar     {"tag3","tag4"} baz    both

What I get

TypeError: unhashable type: 'set'

What would be the best approach here? Unfortunatelly I cant define a fixed amount of columns since the quantity of the tags is not fixed.

Original Q&A

There are 1 answers

**jezrael** · Accepted Answer · 2023-11-09T11:17:25+00:00

Use frozensets and assign to left_on and right_on parameters if need original sets columns:

out = (df1.merge(df2, how="inner", indicator=True, 
                left_on=df1['tags'].apply(frozenset),
                right_on=df2['tags'].apply(frozenset)))
print (out)
          key_0 name_x        tags_x name_y        tags_y _merge
0  (tag3, tag4)    bar  {tag3, tag4}    baz  {tag3, tag4}   both

Or assign frozensets to tags columns:

out = (df1.assign(tags=df1['tags'].apply(frozenset))
          .merge(df2.assign(tags=df2['tags'].apply(frozenset)), 
                how="inner", 
                indicator=True, 
                on="tags"))
print (out)
  name_x          tags name_y _merge
0    bar  (tag3, tag4)    baz   both

TechQA.

How to handle unordered sets of tags in Python Pandas Dataframe in terms of merging

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in SET

Related Questions in UNORDERED

Popular Questions

Popular Tags

Trending Questions