What's the output of YOLO?

9.4k views Asked by At

I'm trying to use YOLO to detect license plate in an Android application.

So I train a YOLOv3 and a YOLOv4 model in Google Colab. I converted these 2 models to TensorFlow Lite, using the wonderfull project of Hunglc007 and I also verified that they are working and got the following result :

3 license plate are detected

But when I try to understand the output of the model to adapt it in my app I got this using netron:

Output of yolov3 model

Why do I have 2 outputs when the model have been trained to detect only one single object?

And why the format of the output is like that, what does this [1,1,4] represents?

EDIT

The code for the bboxes can be found here

boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
            boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
            scores=tf.reshape(
                pred_conf, (tf.shape(pred_conf)[0], -1, tf.shape(pred_conf)[-1])),
            max_output_size_per_class=50,
            max_total_size=50,
            iou_threshold=FLAGS.iou,
            score_threshold=FLAGS.score
        )
        pred_bbox = [boxes.numpy(), scores.numpy(), classes.numpy(), valid_detections.numpy()]
        image = utils.draw_bbox(original_image, pred_bbox)
        # image = utils.draw_bbox(image_data*255, pred_bbox)
        image = Image.fromarray(image.astype(np.uint8))
        image.show()
        image = cv2.cvtColor(np.array(image), cv2.COLOR_BGR2RGB)
        cv2.imwrite(FLAGS.output + 'detection' + str(count) + '.png', image)
2

There are 2 answers

0
AbdelAziz AbdelLatef On BEST ANSWER

I am not an expert in Netron, but from inspecting the problem and its expected outputs, I see that it should produce two outputs for each detection; the detection rectangle and the detection confidence. Hence, the two outputs you ask about are probably, the rectangle which is defined by 4 float numbers - two coordinates of upper left corner, width and height - and the confidence which is one float number.

0
dtlam26 On

It is pretty obvious. For detection model. Generally, it should give at least 2 outputs: bounding boxes and classes with respect to bounding boxes. Therefore, (1,1,4) is the 4 result for the bounding boxes. The first number 1 is according to your image fetch into the model. As you have only one object then the output is 1 for the second number. Furthermore, YOLO configuration for bounding boxes are (x_center,y_center,width,height)

(1,1,1) will be the same, but now 1 is for the label of the class you choose.