Model size reduction problem after quantization

228 views Asked by GodHJ At 21 October 2021 at 14:57

I'm doing my project with tensorflow2 and tf-trt(tensorRT) module for deep learning accleration.

I used tf-trt to quantize pretrained deep neural network(FP32->FP16), and its latency reduction is amazing.

as far as i know, when quantize from floating 32 to floating 16, model size must be half of original.

but there is one problem that model size didn't changed but model size has been increased.(i have also used tensorflow lite to quantize the network, and it works well(reducing model size into half)

Do you guys know why the model size is increased ?

Thanks!

Original Q&A

TechQA.

Model size reduction problem after quantization

There are 0 answers

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in TENSORRT

Related Questions in QUANTIZE

Popular Questions

Popular Tags

Trending Questions