Understanding the output shape of mediapipes tflite models for pose detection

485 views Asked by At

I'm trying to use one of mediapipes pretrained tflite models to perform a pose landmark detection in android (java), which offers me information about 33 landmarks of a human body. I know there are different ways for example using ML Kit, but for better results using one of mediapipes model would be better.

I want to use (https://google.github.io/mediapipe/solutions/models.html) the Pose landmark model.

To use this models in android it's neccessary to know (and especially understand) the output shapes of the model. Those can be read in java (or python):

  • Five outputs: [195], [1], [256, 256, 1], [64, 64, 1], [117] if I got it right.

But in the model card of the model the output array is defined as [33, 5], what make sense because the goals is to detect 33 landmarks with 5 values each.

Can somebody explain the output shapes of the tflite model and how to use them or give me a clue on a documentation I missed.

The used code is auto-genrated from android studio and offered by the "sample code section" of the model in the ml folder (following those instructions https://www.tensorflow.org/lite/inference_with_metadata/codegen#mlbinding, but getting same shapes on using following approach https://www.tensorflow.org/lite/guide/inference#load_and_run_a_model_in_java):

try {
        PoseDetection model = PoseDetection.newInstance(getApplicationContext());

        // Creates inputs for reference.
        TensorBuffer inputFeature0 = TensorBuffer.createFixedSize(new int[]{1, 224, 224, 3}, DataType.FLOAT32);

        // imageBuffer is the image as ByteBuffer
        inputFeature0.loadBuffer(imageBuffer);

        // Runs model inference and gets result.
        PoseDetection.Outputs outputs = model.process(inputFeature0);
        TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();
        TensorBuffer outputFeature1 = outputs.getOutputFeature1AsTensorBuffer();
        TensorBuffer outputFeature2 = outputs.getOutputFeature2AsTensorBuffer();
        TensorBuffer outputFeature3 = outputs.getOutputFeature3AsTensorBuffer();
        TensorBuffer outputFeature4 = outputs.getOutputFeature4AsTensorBuffer();

        // Releases model resources if no longer used.
        model.close();
    } catch (IOException e) {
        // TODO Handle the exception
    }

I got the shapes by inspecting the outputFeatures using the debugger.

Thanks a lot.

1

There are 1 answers

0
EinePriseCode On BEST ANSWER

After some considerations due to the hint in MediaPipe's documentation "Disclaimer: Running MediaPipe on Windows is experimental." (https://developers.google.com/mediapipe/framework/getting_started/install#installing_on_windows), I followed the instructions on google.github.io/mediapipe/getting_started/android.html as @Morrison Chang proposed. This approach needs much time to understand, but grants high customizability and good results. That solved my problem, the old approach seems to be not suitable.