I try to understand why I obtain different metrics using `model.evaluate`

vs `model.predict`

and then `model.metrics`

.

I work on sementic segmentation.

I have a evaluation set of 24 images.

I have a custom DICE INDEX metrics:

```
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum (y_true_f * y_pred_f)
result =(2 * intersection) / (K.sum(y_true_f) + K.sum(y_pred_f))
return result
```

When I use `model.evaluate`

, I obtain a dice score of 0.9093835949897766.

When I use `model.predict`

and then `model.metrics`

, I obtain a dice score of 0.9092264051238695.

To give more precisions : I set a batchsize of 24 in `model.predict`

as well as in `model.evaluate`

to be sure the problem is not caused by batch size. I do not know what happen when the batch size is larger (ex: 32) than the number of sample in evaluation setâ€¦

Finaly, to compute the metrics after `model.prediction`

, I run :

```
dice_result = 0
for y_i in range(len(y)):
dice_result += tf.Session().run(tf.cast(dice_coef(y[y_i], preds[y_i]),
tf.float64))
tf.Session().close
dice_result /= (len(y))
```

I thought about the `tf.float64`

casting to be the cause of the difference ?

Do you think about an explanation ?

Thank you for your help.