I'm having a same problem. I created a simple Transformer Decoder in TensorFlow, and it works well. But If I convert it to a CoreML model, from some point it start do just outputs MLMultiArray filled with NaN values. And Strange thing is that if I reinitialize the model during at every time step, CoreML never returns NaN array during inference.
for i in 0..<30 {
    decoder = try! iOS_Deocder(configuration: config) // <- Like this!
    let ddd = decoder.prediction(input_1: image_feature, input_2: tokens!).Identity
    
    // some additional codes
}
To address this issue, I tried converting the TF mode to CoreML model  with compute_precision=coremltools. precision.Float32 and compute_precision=coremltools. precision.Float16 and also tried setting let config = MLModelConfiguration() config.computeUnits = .cpuOnly but none of them didn't work.
But strange thing is that the way I define model in TF slightly improved it.
The final output layer in TF looked like this:
final_output = self.final_layer(seq_layer_output)
final_output = final_output + custom_bias
but removing the last line like this:
final_output = self.final_layer(seq_layer_output)
Improved the CoreML model in following way: Previously CoreML model started to generating NaN array from third time step of inference, this made CoreML to start generating NaN array from 5th or 6th. Plus also removing all for loops for decoder layers also improved it.
My guess is that during some inferencing step some inner state of the CoreML is being stored, and that's affecting inferencing at next time step?
Spend 4 days into it, but can't figure it out. Can anyone help me with this issue?