Why Yolo_9000 use only local image information?

260 views Asked by At

One thing confused me when I was reading the Yolo_9000 paper.
In Yolov2 structure, the final layer size is 13x13, seems each cell contains roughly 32x32 image information from original image. For me, it looks like to use only local information to fit a object detection, I am not sure whether it is enough or robust.

In v1 version, there is a full connected layer to combine local to global information, I feel that is more reasonable.

Or something I understand wrong, this question does bother me. Thanks..

1

There are 1 answers

4
Thomas Pinetz On

But the information is already used by the convolutions. Every 3x3 filter uses the sorrounding information of the last filter at every pixel. Those pixels are in turn already convolutions and use the surrounding information of their pixels and so on and so forth. In combination with reduction in the image size by max pooling the whole image is covered like this.