I am in the process of quantizing a model to int8 in order to make it run on the coral edgetpu. In order to do that I am using the tflite converter. My code looks like this one
class TensorFlowLiteConverter:
def __init__(self,
original_network,
input_data_generator,
tflite_model_folder,
tflite_model_file = '',
edgetpu_compiler_path = '/bin/edgetpu_compiler'
):
"""
:param original_network : original keras model.
:param input_data_generator : should be a tf.lite.RepresentativeDataset adapted to generate the input for the specified model.
:param tflite_model_folder : path to the folder where the tflite model should be saved.
:param tflite_model_file : label of the model added to the file name.
:param edgetpu_compiler_path: path to the edge tpu compiler, may change depending on the system you are running on.
"""
self.original_model = original_network
self.input_data_generator = input_data_generator
self.tflite_model_folder = tflite_model_folder
self.tflite_model_file = tflite_model_file
self.edgetpu_compiler_path = edgetpu_compiler_path
self.tflite_model = None # Used afterwards to compile the model for the tpu
def convert_model(self):
"""
Conversion function from TensorFlow to TensorFlow Lite model.
"""
converter = tf.lite.TFLiteConverter.from_keras_model(self.original_model)
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = self.input_data_generator
converter.target_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
self.tflite_model = converter.convert()
self._save_converted_model()
return self.tflite_model
def _save_converted_model(self):
tflite_model_file = self.tflite_model_file + '_int' + time_string + '.tflite'
with open(os.path.join(self.tflite_model_folder, tflite_model_file), 'wb') as tflite_out_file:
tflite_out_file.write(self.tflite_model)
self.tflite_model_file = tflite_model_file
print('Tflite model saved as ', os.path.join(self.tflite_model_folder, self.tflite_model_file))
return tflite_model_file
def compile_to_edgetpu(self):
"""
Function to call the EdgeTPU compiler on the saved TFlite integer file.
Log files in the model folder give information on the operations mapped to the TPU.
"""
if not re.search('\\.tflite$', self.tflite_model_file):
print('Error, input file must end in .tflite')
sys.exit(1)
command_arguments = (self.edgetpu_compiler_path, '-o', self.tflite_model_folder, '-m',
'13', '-s', os.path.join(self.tflite_model_folder,
self.tflite_model_file))
log_file_name = re.sub('\\.tflite$', '.log',
os.path.basename(self.tflite_model_file))
assert log_file_name != os.path.basename(self.tflite_model_file)
results = subprocess.run(command_arguments,
capture_output=True,
check=True)
with open(os.path.join(self.tflite_model_folder, log_file_name), 'wt') as log_file:
log_file.write(results.stderr.decode())
log_file.write(results.stdout.decode())
edgetpu_file = self.tflite_model_file.replace('.tflite', edgetpu_ext, 1)
model_path = os.path.join(self.tflite_model_folder , edgetpu_file)
print('TPU compiled model saved as ', model_path)
return model_path
Then, I create the representative dataset using 300 elements from the training set. Here the problem arises; when I increase the number of samples in the representative dataset, the internal activations ranges of the model grow, and consequently the quantized model ones. It seems to me that the tflite converter does not cut the tails of the distributions generated by performing inference on he representative dataset, and this is an issue when dealing with outliers, because if the integer model has to represent the whole sequence, then the representative power decreases.
For example, let's consider a trivial model with one layer. Let's say that I use 10 image as a representative dataset. Let's say that 99.9% of the output values are in the range [-1, 1] and a single one has value 100. Is would be nice if the quantization process ignored that value to represent better the most elements.
Is there a way to control this behavior using tensrflow converters? I found the attributes quantized_input_stats and default_ranges_stats of the class TFLiteConverterBaseV1, but they do not seem to be used to constrain the range.
quantized_input_stats: Map of input tensor names to a tuple of floats
representing the mean and standard deviation of the training data. (e.g.,
{"foo" : (0., 1.)}). Required if `inference_input_type` is tf.int8 or
tf.uint8. (default None)
I tried to quantize the model using the from_saved_model function and seeting the quantized_input_stats, but the range doesn't change.
Then, I noticed several times, that if I decrease the representative dataset size the range of the model output decreases and the results with the quantized model are better.