Labeling large set of paired training data

Question

Labeling large set of paired training data

78 views Asked by coderboi At 16 October 2020 at 23:03

I'm training a model to determine if two people are the same. The model should take in two people(represented as dataframe rows)

I'm trying to label paired data of the form

Id  | age    | gender| occupation  | region | height | weight(kg)
100 | 16     | 0     | "plumber"   | na     | 169    | 20
300 | 50     | 1     | na          | africa | 12     | 90

Id  | age    | gender| occupation  | region | height | weight(kg)
100 | 16     | 0     | "plumber"   | na     | 169    | 20
700 | 100    | 0     | na          | africa | 12     | 90

Each of these pairs is sent to separate csv files for labeling, since I want to train a classifier that takes in pairs of people rows, and labels them as duplicates or not.

As you can see, if I have only 10 people, this could quickly get out of hand. 10 C 2 = 45 pairs. Any ideas, on how to make labeling the data easier?

I've thought about doing this in excel, but I feel like opening this many excel files is sure to create issues.

Original Q&A

There are 2 answers

Prune On 16 October 2020 at 23:13

Sort the data frame O(N*log(N))
Check to see whether adjacent rows are equal O(N)

To do something with adjacent rows, simply shift the column one position; compare each row to the original.

**coderboi** · Accepted Answer · 2020-10-17T15:46:47+00:00

coderboi On 17 October 2020 at 15:46 BEST ANSWER

So I figured it out, I just need to pair the rows in excel, ie row1 features, row2 features, label. It is pretty annoying to read the features horizontally, but if I use an external monitor or 2 it isn't terrible.

TechQA.

Labeling large set of paired training data

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in LABELING

Popular Questions

Popular Tags

Trending Questions