I'm training a model to determine if two people are the same. The model should take in two people(represented as dataframe rows)
I'm trying to label paired data of the form
Id | age | gender| occupation | region | height | weight(kg)
100 | 16 | 0 | "plumber" | na | 169 | 20
300 | 50 | 1 | na | africa | 12 | 90
Id | age | gender| occupation | region | height | weight(kg)
100 | 16 | 0 | "plumber" | na | 169 | 20
700 | 100 | 0 | na | africa | 12 | 90
Each of these pairs is sent to separate csv files for labeling, since I want to train a classifier that takes in pairs of people rows, and labels them as duplicates or not.
As you can see, if I have only 10 people, this could quickly get out of hand. 10 C 2 = 45 pairs. Any ideas, on how to make labeling the data easier?
I've thought about doing this in excel, but I feel like opening this many excel files is sure to create issues.
So I figured it out, I just need to pair the rows in excel, ie row1 features, row2 features, label. It is pretty annoying to read the features horizontally, but if I use an external monitor or 2 it isn't terrible.