I am developing an Android app in which I want to track a 2D image/a piece of paper, analyze what the user write/draw on it, and correctly display different 3D contents on it.
I am working on the tracking and displaying simple 3D contents part, which can actually be achieved using SDKs like Vuforia and Wikitude. However, I am not using them for several reasons.
- There are other analysis on the image to be done, e.g. drawings analysis.
- The image may not be as rich in features, e.g. paper with lines or some figures.
- SDKs like Vuforia may not expose some underlying functionalities like feature detection etc. to developers.
Anyway, right now I only want to achieve the following result.
- I have a piece of paper, probably with lines and figures on it. You can think of it as the kind of paper for children to practice writing or drawing on. Example: https://i.pinimg.com/236x/89/3a/80/893a80336adab4120ff197010cd7f6a1--dr-seuss-crafts-notebook-paper.jpg
- I point my phone (the camera) at the paper while capturing the video frames.
- I want to register the paper, track it and display a simple wire-frame cube on it.
I have been messing around with OpenCV, and have tried the following approaches.
Using homography:
- Detect features in the 2D image (ORB, FAST etc.).
- Describe the features (ORB).
- Do the same in each video frame.
- Match the features and find good matches.
- Find the homography, use the homography and successfully draw a rectangle around the image in the video frame.
- Did not know how to use the homography decomposition (into rotations, translations and normals) to display a 3D object like a cube.
Using solvePnP:
1 to 4 are the same as the above.
- Convert all 2D good match points in the image to 3D by assuming the image lies on the world's x-y plane, thus all having z = 0.
- Use solvePnP with those 3D points and 2D points in the current frame to retrieve the rotation and translation vectors, and further convert it to the projection matrix using Rodrigues() in OpenCV.
- Construct the 3D points of a cube.
- Project them into the 2D image using the projection and the camera matrix.
- The issue is the cube is jumping around, which I believe is due to the feature detection and mapping not being stable and accurate, thus affecting solvePnP.
Using contours or corners:
I simply grayscale the camera frame, Gaussian-smooth it, dilate or erode it and try to find the biggest 4-edge contour so that I can track it using solvePnP etc. This, unsurprisingly, doesn't give good results, or I'm just doing it wrong.
So my questions are:
- How can I solve the two bold problems mentioned above.
- More generally, given the type of image target I want to track, what would be the optimal algorithm/solution/technique to track it?
- What are the things that I can improve/change in my way of solving the problem?
Thank you very much.