As I know, Homography (projective transformation) in computer vision can be used to detect object in images but all the examples, I've seen, were on planar objects. Does Homography only work on a planar surface object? Or It can detect any kind of objects? I'm asking because I tried to find the object below (It's a non planar surface object) with no success:
In this link, you can see the code. I used it as it's with just updating the name of images so what we're doing is the following:
- Getting the keypoints from the 2 images using SURF
- Describe the keypoints using SURF descriptor
- Try to match the keypoints between the 2 images
- Use the list of matched key points to compute Homography matrix.
- Get the coordinates of the object corner points.
- Apply perspectiveTransform to get their correspondings in the scene image
- Draw lines between the results points.
Please note that the green lines drawn inside the big circles in the images are the line drawn to represent the results points.
Based on what I showed above, it seems to me that there is something not clear in my understanding of Homography and where it can be applied because this kind of example is quite simple and it didn't work. I'm currently looking into OpenCV code to understand exactly how they estimate it but it's not going quickly.So, does anyone have any clue on how OpenCV computes this transformation? Or any reference that can help in this situation?
EDITED: Here is another example:
I applied the homography on the object and the yellow box which is just containing the instrument I need. The results are even worse because now it's somehow a point as you can see in the green point surrounded by the red circle. Also, I can't take images for the objects from the scene because I have a lot of videos so what I'm doing is taking a separate images for each intruments and trying to find them in the scene videos.
Strictly speaking, you are right, homographies only map observations of planar objects. It is not very clear in your post, but my guess is that the matches you are showing are the inlier matches found by
findHomography
. As you said, such an approach works well for planar objects. In case of non-planar but rigid objects, the equivalent would be the inlier matches found byfindFundamentalMat
(see OpenCV doc and the Wikipedia page).Still, in practice, using an homography should at least provide an approximate solution.
In my opinion, your problem is more related to poor SURF matches and not so much with the choice of the homography transform. This is quite clear when looking at the pair of images you are showing: there's only a couple of points matched to the object you want to detect, while the majority of them are matched with various things in the scene.
One of the main concerns with the approach you have chosen is that you are not dealing with rigid objects, but with deformable ones: the handle of the seringue can move, there are non linear deformation of the appearance due to the liquid inside the seringue, etc... Such deformations can make the SURF descriptors extracted in the target image quite different from those extracted in the reference image and therefore impossible to match. Have a look at [1], they provide good insights on why descriptors happen to match or not.
For your problem, alternative approaches might be local matching (e.g. with small correlation patches), color matching, shape matching, deep learning, etc.
[1]: Vondrick, Carl, et al. "Hoggles: Visualizing object detection features." Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013. (link)