I try to understand why I obtain different metrics using model.evaluate vs model.predict and then model.metrics.

I work on sementic segmentation.

I have a evaluation set of 24 images.

I have a custom DICE INDEX metrics:

def dice_coef(y_true, y_pred):

    y_true_f = K.flatten(y_true)

    y_pred_f = K.flatten(y_pred)

    intersection = K.sum (y_true_f * y_pred_f)

    result =(2 * intersection) / (K.sum(y_true_f) + K.sum(y_pred_f))

return result

When I use model.evaluate, I obtain a dice score of 0.9093835949897766.

When I use model.predict and then model.metrics, I obtain a dice score of 0.9092264051238695.

To give more precisions : I set a batchsize of 24 in model.predict as well as in model.evaluate to be sure the problem is not caused by batch size. I do not know what happen when the batch size is larger (ex: 32) than the number of sample in evaluation set…

Finaly, to compute the metrics after model.prediction, I run :

dice_result = 0

for y_i in range(len(y)):

   dice_result += tf.Session().run(tf.cast(dice_coef(y[y_i], preds[y_i]),



dice_result /= (len(y))

I thought about the tf.float64 casting to be the cause of the difference ?

Do you think about an explanation ?

Thank you for your help.

0 Answers