I cannot get flexible shapes working with an ONNX model I am converting to a MLModel using coremltools 4.0. The source model is from PyTorch, but I cannot use the new unified conversion because coremltools at this time does not support a reflection_pad2d layer used in the model.
coremltools compiles the model without any warning or error and shows that flexible shapes are supported:
input {
name: "input"
type {
imageType {
width: 1024
height: 1024
colorSpace: BGR
imageSizeRange {
widthRange {
lowerBound: 256
upperBound: -1
}
heightRange {
lowerBound: 256
upperBound: -1
}
}
}
}
}
output {
name: "output"
type {
imageType {
width: 1024
height: 1024
colorSpace: RGB
imageSizeRange {
widthRange {
lowerBound: 256
upperBound: -1
}
heightRange {
lowerBound: 256
upperBound: -1
}
}
}
}
}
But running a prediction on the model fails with the message:
MyApp[5773:4974761] [espresso] [Espresso::handle_ex_plan] exception=Invalid X-dimension 1/814 status=-7
MyApp[5773:4974761] [coreml] Error binding image input buffer input: -7
MyApp[5773:4974761] [coreml] Failure in bindInputsAndOutputs.
prediction error: Error Domain=com.apple.CoreML Code=0 "Error binding image input buffer input." UserInfo={NSLocalizedDescription=Error binding image input buffer input.}
Enumerated shapes will function with the model, but that is not adequate without having like 10k+ enumerated shapes which just doesn't seem like a solution.
The model is a fully convolutional network, it does not appear to use any fixed shapes (see spec output), and it works with different shapes in PyTorch, so it seems like it must be possible to get flexible shapes working somehow.
I've tried using flexible input shapes using image input/output:
input_names=['input']
output_names=['output']
channels = 3
input_shape = ct.Shape(shape=(channels, ct.RangeDim(), ct.RangeDim()))
#also tried:
input_shape = ct.Shape(shape=(channels, ct.RangeDim(256, 4096), ct.RangeDim(256, 4096)))
#and:
input_shape = ct.Shape(shape=(channels, ct.RangeDim(256, -1), ct.RangeDim(256, -1)))
model_input = ct.TensorType(shape=input_shape)
mlmodel = convert('torch_model.onnx',
[model_input],
image_input_names=input_names,
image_output_names=output_names,
...
)
spec = mlmodel.get_spec()
#tried with and without adding flexible shapes
spec = add_flexible_shapes(spec)
def add_flexible_shapes(spec):
img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange(height_range=(256, -1), width_range=(256, -1))
#also tried:
#img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange(height_range=(256, 4096), width_range=(256, 4096))
flexible_shape_utils.update_image_size_range(spec, feature_name=input_names[0], size_range=img_size_ranges)
flexible_shape_utils.update_image_size_range(spec, feature_name=output_names[0], size_range=img_size_ranges)
return spec
I also tried first converting the model as a multiarray, then convert to image, then add flexible shapes.
torch.onnx.export(torch_model, example_input, 'torch_model.onnx', input_names=input_names, output_names=output_names, verbose=True)
mlmodel = ct.converters.onnx.convert(model='torch_model.onnx',
...
spec = mlmodel.get_spec()
input = spec.description.input[0]
input.type.imageType.colorSpace = ft.ImageFeatureType.RGB
input.type.imageType.height = 1024
input.type.imageType.width = 1024
output = spec.description.output[0]
output.type.imageType.colorSpace = ft.ImageFeatureType.RGB
output.type.imageType.height = 1024
output.type.imageType.width = 1024
spec = add_flexible_shapes(spec)
I've looked at all the layers in the spec, and I don't see any that use a fixed shape (other than the input/output layers):
specificationVersion: 4
description {
input {
name: "input"
type {
imageType {
width: 1024
height: 1024
colorSpace: RGB
}
}
}
output {
name: "output"
type {
imageType {
width: 1024
height: 1024
colorSpace: RGB
}
}
}
metadata {
userDefined {
key: "com.github.apple.coremltools.source"
value: "onnx==1.7.0"
}
userDefined {
key: "com.github.apple.coremltools.version"
value: "4.0"
}
}
}
neuralNetwork {
layers {
name: "Pad_0"
input: "input"
output: "63"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 4
endEdgeSize: 4
}
borderAmounts {
startEdgeSize: 4
endEdgeSize: 4
}
}
}
}
layers {
name: "Conv_1"
input: "63"
output: "64"
convolution {
outputChannels: 16
kernelChannels: 3
nGroups: 1
kernelSize: 9
kernelSize: 9
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_2"
input: "64"
output: "65"
batchnorm {
channels: 16
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_3"
input: "65"
output: "66"
activation {
ReLU {
}
}
}
layers {
name: "Pad_4"
input: "66"
output: "67"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_5"
input: "67"
output: "68"
convolution {
outputChannels: 32
kernelChannels: 16
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 2
stride: 2
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_6"
input: "68"
output: "69"
batchnorm {
channels: 32
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_7"
input: "69"
output: "70"
activation {
ReLU {
}
}
}
layers {
name: "Pad_8"
input: "70"
output: "71"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_9"
input: "71"
output: "72"
convolution {
outputChannels: 64
kernelChannels: 32
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 2
stride: 2
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_10"
input: "72"
output: "73"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_11"
input: "73"
output: "74"
activation {
ReLU {
}
}
}
layers {
name: "Pad_12"
input: "74"
output: "75"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_13"
input: "75"
output: "76"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_14"
input: "76"
output: "77"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_15"
input: "77"
output: "78"
activation {
ReLU {
}
}
}
layers {
name: "Pad_16"
input: "78"
output: "79"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_17"
input: "79"
output: "80"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_18"
input: "80"
output: "81"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Add_19"
input: "81"
input: "74"
output: "82"
addBroadcastable {
}
}
layers {
name: "Pad_20"
input: "82"
output: "83"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_21"
input: "83"
output: "84"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_22"
input: "84"
output: "85"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_23"
input: "85"
output: "86"
activation {
ReLU {
}
}
}
layers {
name: "Pad_24"
input: "86"
output: "87"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_25"
input: "87"
output: "88"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_26"
input: "88"
output: "89"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Add_27"
input: "89"
input: "82"
output: "90"
addBroadcastable {
}
}
layers {
name: "Pad_28"
input: "90"
output: "91"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_29"
input: "91"
output: "92"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_30"
input: "92"
output: "93"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_31"
input: "93"
output: "94"
activation {
ReLU {
}
}
}
layers {
name: "Pad_32"
input: "94"
output: "95"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_33"
input: "95"
output: "96"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_34"
input: "96"
output: "97"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Add_35"
input: "97"
input: "90"
output: "98"
addBroadcastable {
}
}
layers {
name: "Pad_36"
input: "98"
output: "99"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_37"
input: "99"
output: "100"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_38"
input: "100"
output: "101"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_39"
input: "101"
output: "102"
activation {
ReLU {
}
}
}
layers {
name: "Pad_40"
input: "102"
output: "103"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_41"
input: "103"
output: "104"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_42"
input: "104"
output: "105"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Add_43"
input: "105"
input: "98"
output: "106"
addBroadcastable {
}
}
layers {
name: "Pad_44"
input: "106"
output: "107"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_45"
input: "107"
output: "108"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_46"
input: "108"
output: "109"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_47"
input: "109"
output: "110"
activation {
ReLU {
}
}
}
layers {
name: "Pad_48"
input: "110"
output: "111"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_49"
input: "111"
output: "112"
convolution {
outputChannels: 64
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_50"
input: "112"
output: "113"
batchnorm {
channels: 64
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Add_51"
input: "113"
input: "106"
output: "114"
addBroadcastable {
}
}
layers {
name: "Upsample_52"
input: "114"
output: "123"
upsample {
scalingFactor: 4
scalingFactor: 4
}
}
layers {
name: "Pad_53"
input: "123"
output: "124"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_54"
input: "124"
output: "125"
convolution {
outputChannels: 32
kernelChannels: 64
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 2
stride: 2
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_55"
input: "125"
output: "126"
batchnorm {
channels: 32
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_56"
input: "126"
output: "127"
activation {
ReLU {
}
}
}
layers {
name: "Upsample_57"
input: "127"
output: "136"
upsample {
scalingFactor: 4
scalingFactor: 4
mode: BILINEAR
}
}
layers {
name: "Pad_58"
input: "136"
output: "137"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
borderAmounts {
startEdgeSize: 1
endEdgeSize: 1
}
}
}
}
layers {
name: "Conv_59"
input: "137"
output: "138"
convolution {
outputChannels: 16
kernelChannels: 32
nGroups: 1
kernelSize: 3
kernelSize: 3
stride: 2
stride: 2
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
layers {
name: "InstanceNormalization_60"
input: "138"
output: "139"
batchnorm {
channels: 16
computeMeanVar: true
instanceNormalization: true
epsilon: 9.999999747378752e-06
gamma {
}
beta {
}
}
}
layers {
name: "Relu_61"
input: "139"
output: "140"
activation {
ReLU {
}
}
}
layers {
name: "Pad_62"
input: "140"
output: "141"
padding {
reflection {
}
paddingAmounts {
borderAmounts {
startEdgeSize: 4
endEdgeSize: 4
}
borderAmounts {
startEdgeSize: 4
endEdgeSize: 4
}
}
}
}
layers {
name: "Conv_63"
input: "141"
output: "output"
convolution {
outputChannels: 3
kernelChannels: 16
nGroups: 1
kernelSize: 9
kernelSize: 9
stride: 1
stride: 1
dilationFactor: 1
dilationFactor: 1
valid {
paddingAmounts {
borderAmounts {
}
borderAmounts {
}
}
}
hasBias: true
weights {
}
bias {
}
}
}
arrayInputShapeMapping: EXACT_ARRAY_MAPPING
imageInputShapeMapping: RANK4_IMAGE_MAPPING
}