I am creating an application where I couple different cameras to different neural networks to perform object detection using TensorFlow Lite. For now, I would like to use the YOLOv4 tiny, YOLOv8 small, SSD MobileNet, and RT-DETR algorithm. The thing is, it is quite cumbersome to write classes for all these models and find the location of the bounding boxes, detection classes, detection scores,... I was wondering if there was a way to always find the location of these tensors using the output of the interpreter.
For instance:
def inference(self):
for i in self.output:
if len(i['shape']) == 3:
detection_boxes = self.interpreter.get_tensor(i['index'])
elif i['dtype'] == np.float32:
detection_scores = self.interpreter.get_tensor(i['index'])
else:
# others
pass
With self.output = self.interpreter.get_output_details()
What I am doing now is, for every model, create a function called get_output_detection_groups(self)
that will create these values based on the class (a.k.a. algorithm/model), for SSD MobileNet this becomes:
def get_output_detection_groups(self):
detection_classes = self.interpreter.get_tensor(
self.output[1]['index'])[0]
detection_scores = self.interpreter.get_tensor(
self.output[2]['index'])[0]
detection_boxes = list(
map(lambda arr:
list(map(lambda val: int(val*300), arr)),
self.interpreter.get_tensor(self.output[0]['index'])[0]))
num_detections = self.interpreter.get_tensor(self.output[3]['index'])
num_detections = int(np.minimum(num_detections[0], 10))
return detection_classes, detection_scores, detection_boxes, num_detections
This way, I can have an abstract class that can perform call inference and this will be general, but then I have to create a different class for every model. If there is a way or a better way to do this, this would be really helpful. Thank you!