In Viola and Jones's paper, section 5.6 the authors mention they integrate multiple detections their detector make while scanning for faces in images. In this case, a detection is simply a rectangular region positioned over an image where the detector considered that a face exist. Integration is the simple combination of overlapping rectangular regions. This is useful because the detector may produce many detections around a single face while scanning an image.
Note that many detections may not overlap others, so it is possible to have more than one integrated detection. For instance, consider an image with 2 faces, A
and B
. Consider also that the image has regions C
and D
that are very similar to faces, but are not faces. In this case, the detector may produce 3 overlapping detections for face A
, 5 overlapping detections for face B
, 1 detection for region C
and 10 overlapping detections for region D
. Considering that the detections for a region do not overlap with detections of another region, the integration procedure should produce a single final detection for face A
, another detection for face B
, another single detection for region C
and another single detection for region D
. Those 4 integrated detections should be displayed to the user.
So, to integrate the detections, it is necessary to separate them into subsets of overlapping detections. But we should add to a subset any detection that overlaps with any detection already in the subset. Simple iterations over a list of detections will perform poorly (Ω(n²)).
My question is then: what data structure and algorithm allows the quick integration of detections? Please, do provide some literary evidence (paper or book reference, for instance) about the approach you're mentioning.
I think R-trees should solve the problem.