Model size reduction problem after quantization

211 views Asked by At

I'm doing my project with tensorflow2 and tf-trt(tensorRT) module for deep learning accleration.

I used tf-trt to quantize pretrained deep neural network(FP32->FP16), and its latency reduction is amazing.

as far as i know, when quantize from floating 32 to floating 16, model size must be half of original.

but there is one problem that model size didn't changed but model size has been increased.(i have also used tensorflow lite to quantize the network, and it works well(reducing model size into half)

Do you guys know why the model size is increased ?

Thanks!

0

There are 0 answers