As many other developers, I have plunged myself into Apple's new ARKit technology. It's great. For a specific project however, I would like to be able to recognise (real-life) images in the scene, to either project something on it (just like Vuforia does with its target images), or to use it to trigger an event in my application.
In my research on how to accomplish this, I stumbled upon the Vision and CoreML frameworks by Apple. This seems promising, although I have not yet been able to wrap my head around it.
As I understand it, I should be able to do exactly what I want by finding rectangles using the Vision framework and feeding those into a CoreML model that simply compares it to the target images that I predefined within the model. It should then be able to spit out which target image it found.
Although this sounds good in my head, I have not yet found a way to do this. How would I go about creating a model like that, and is it even possible at all?
As of ARKit 1.5 (coming with IOS 11.3 in the spring of 2018), a feature seems to be implemented directly on top of ARKit that solves this problem.
ARKit will fully support image recognition. Upon recognition of an image, the 3d coordinates can be retrieved as an anchor, and therefore content can be placed onto them.