coreML model converted from pytorch model giving the wrong prediction probabilities

683 views Asked by At

I have a pytorch binary classification model that I converted to coreML. I converted my model directly and indirectly through onnx using the following tutorials/documentation respectively https://coremltools.readme.io/docs/pytorch-conversion, and https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/resnet50_modelzoo_onnxruntime_inference.ipynb .

The output prior to the softmax function and the probabilities are similar for both the original pytorch and the onnx model converted from PyTorch. But the output for the coreML model converted from PyTorch via the tutorial documentation is completely incorrect. I had no errors compiling the coreML method from either method.

Checking the weights of the last layer for coreML and Pytorch seem to be the same. the output of the coreML model prior to softmax gives me {'classLabel': '_xx', 'classLabelProbs': {'_xx': 29.15625, 'xx': -22.53125}}

while the output from the pytorch model give me [-3.2185674 3.4477997]

The output of the conversion from onnx to coreML looks like...

58/69: Converting Node Type Add
59/69: Converting Node Type Relu
60/69: Converting Node Type Conv
61/69: Converting Node Type BatchNormalization
62/69: Converting Node Type Relu
63/69: Converting Node Type Conv
64/69: Converting Node Type BatchNormalization
65/69: Converting Node Type Add
66/69: Converting Node Type Relu
67/69: Converting Node Type GlobalAveragePool
68/69: Converting Node Type Flatten
69/69: Converting Node Type Gemm
Translation to CoreML spec completed. Now compiling the CoreML model.
Model Compilation done.

While the output of the pytorch model when I print looks like this for the final layer....

(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=2, bias=True).

How do I go about resolving the quantitative errors produced from my coreML model that was converted from PyTorch?

2

There are 2 answers

2
Matthijs Hollemans On

It's probably an issue with your image preprocessing options: https://machinethink.net/blog/help-core-ml-gives-wrong-output/

0
Jonathan Brown On

update:

using the coreml unified api I have added a scaled layer. My outputs are not giving any probabilities for my classifier.

![last couple layers of converted pytorch model][1] [1]: https://i.stack.imgur.com/9bzd2.png

the last layer prints out a tensor instead of probabilities. So I added a softmax function via the network.builder

builder.add_softmax(name="softmax", input_name="305", output_name="307:labelProbabilityLayerName")

the previous last node had the output name equal to "307:labelProbabilityLayerName" and I changed it to "305", prior to me adding the softmax(). this way the previous last node's output is the input to my softmax. Also, now the output can be passed to my softmax can now connect to the original string class printing out the intended probabilites. I am still getting an error saying...

"RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: Layer 'softmax' consumes an input named '307' which is not present in this network."."

Which doesnt make sense because I defined my softmax to consume '305' and also updated that last layer which is an innerproduct layer to output 305.