When writing the TFRecords for the object detection API we have to include the field image/object/class/label
to give the list of labels corresponding to each bounding box.
Is there any way to pass a vector instead of the integer label?
The reason I ask is that the labels I'm trying to encode have a hierarchical nature, and I'm interested in trying to capture in a label vector and see what effect that has on the training.
For instance let's say I was detecting cars, vans, mopeds and motorbikes. If I was to label a motorbike bounding, rather than passing 3
as the label, I might want to use the vector [0, 0, 0.33, 1.0]
so as not to "punish" motorbike and moped confusion, as much as motorbike -> car confusion.
If it's not possible to encode the labels as vectors, I would be interested to hear if there are any other possible solutions to the scenario I outlined