Quantization aware training in tensorflow 2.2.0 producing higher inference time

Question

Quantization aware training in tensorflow 2.2.0 producing higher inference time

508 views Asked by Aparajit Garg At 09 September 2020 at 06:22

I'm working on quantization in transfer learning using MobilenetV2 for personal dataset. There are 2 approaches that I have tried:

i.) Only post training quantization: It is working fine and is producing 0.04s average time for inference of 60 images at 224,224 dimensions.

ii.) Quantization aware training + post training Quantization: It is producing greater accuracy than post training quantization only but is producing a higher inference time of 0.55s for the same 60 images.

1.) Only post training quantization model(.tflite) can be inferenced by:

        images_ = cv2.resize(cv2.cvtColor(cv2.imread(imagepath), cv2.COLOR_BGR2RGB), (224, 224))
        images = preprocess_input(images_)
        interpreter.set_tensor(
                    interpreter.get_input_details()[0]['index'], [x])
        interpreter.invoke()
        classes = interpreter.get_tensor(
            interpreter.get_output_details()[0]['index'])

2.) Quantization aware training + post training quantization can be inferenced by the below code. The difference is that here it asks for float32 input.

        images_ = cv2.resize(cv2.cvtColor(cv2.imread(imagepath), cv2.COLOR_BGR2RGB), (224, 224))
        images = preprocess_input(images_)
        x = np.expand_dims(images, axis=0).astype(np.float32)
        interpreter.set_tensor(
                    interpreter.get_input_details()[0]['index'], x)
        interpreter.invoke()
        classes = interpreter.get_tensor(
            interpreter.get_output_details()[0]['index'])

I have searched a lot but didn't got any response for this query. If possible please help with why I'm getting the inference time high in case of quantization aware training + post training quantization compared to only post training quantization?

Original Q&A

There are 2 answers

**Thaink** · Answer 1 · 2020-09-10T04:39:41+00:00

I don't think you should do quantization aware training + post training quantization to together.

According to https://www.tensorflow.org/model_optimization/guide/quantization/training_example, If you use quantization aware training, the conversion will give you a model with int8 weights. So, there is no point to do the post training quantization here.

**Louis Yang** · Answer 2 · 2023-01-12T02:39:34+00:00

Louis Yang On 12 January 2023 at 02:39

I think the part that converts from uint8 to float32 (.astype(np.float32)) is what makes it slower. Otherwise, they should be at the same speed.

TechQA.

Quantization aware training in tensorflow 2.2.0 producing higher inference time

There are 2 answers

Related Questions in PYTHON

Related Questions in COMPUTER-VISION

Related Questions in TENSORFLOW2.0

Related Questions in QUANTIZATION-AWARE-TRAINING

Related Questions in OPTIMIZER-DEEPLEARNING

Popular Questions

Popular Tags

Trending Questions