How to tap on object from an image and track it from sequence of images using Vision and Core ML framework

420 views Asked by At

I am developing an app using new Core ML framework. What I am trying to achieve is as follows: 1. Select a image and tap on any object from it to draw rectangle 2. After that track that object in multiple images just running in for loop

Currently I am doing with following process

  1. Detect object when user tas and store it VNDetectedObjectObservation = VNDetectedObjectObservation(boundingBox: convertedRect)

  2. Create VNTrackObjectRequest for VNImageRequestHandler to perform the request

But not getting proper result. Any help will be appreciated.

1

There are 1 answers

0
ITiger On

I am not familiar with coreml and objective c, so I can't offer you any code example, but as nobody gives you any answer, I would like to descriebe you the way I would solve this manually:

  1. Get the tapped point and expand a region (of interest), like a N x N square around that point.
  2. Perform a classification on the tapped region, so the algorithm can detect the structure in the consecutive frames.
  3. Store the location in the current frame, then expand that region for the following frame and use this expanded region to detect the object in it.

With this strategy you can use the expanded region from step 3 for an object detection task that you can solve with a YOLO implementation. But it is way faster than putting the whole frame into an object detection, because it only performs the detection on a smalll region.

I hope this helps you at least a bit.