I have two Dataframes which contain tags in one of the columns. Tags can be present or absent in the list and are stored in an unordered manner. I would like to get the merge of the two datasets with rows that have the same tags. In plain python I can store them as the set datatype.
tags1 = {"tag1","tag2"}
tags2 = {"tag2","tag1"}
tags1 == tags2
True
However I cant do the same in Pandas since sets are not hashable.
df1 = pd.DataFrame([{'name': 'foo', 'tags': {"tag1","tag2"}}, {'name': 'bar', 'tags': {"tag3","tag4"}}])
df2 = pd.DataFrame([{'name': 'baz', 'tags': {"tag4","tag3"}}, {'name': 'qux', 'tags': {"tag5","tag6"}}])
df1.merge(df2, how="inner", indicator=True, on="tags")
What I would like to see
name_x value name_y _merge
0 bar {"tag3","tag4"} baz both
What I get
TypeError: unhashable type: 'set'
What would be the best approach here? Unfortunatelly I cant define a fixed amount of columns since the quantity of the tags is not fixed.
Use
frozenset
s and assign toleft_on
andright_on
parameters if need originalset
s columns:Or assign
frozenset
s totags
columns: