How to manually dequantize the output of a layer and requantize it for the next layer in Pytorch?

45 views Asked by At

I am working on school project that requires me to perform manual quantization of each layer of a model. Specifically, I want to implement manually:

Quantized activation, combined with quantized weight A - layer A - quantized output - dequantized output - requantized output, combined with quantized weight B - layer B - ...

I know Pytorch already have a quantization function, but that function is limited to int8. I would like to perform quantization from bit = 16 to bit = 2, and then compare their accuracy.

The issue I encountered is that after quantization, the output of a layer is multi-magnitude larger (with bit = 16), and I don't know how to dequantize it back. I am performing the quantization with the same min and max of both activation and weight. So here is an example:

Activation = [1,2,3,4]
Weight = [5,6,7,8]
Min and max across activation and weight = 1, 8
Expected, non-quantized output = 70

Quantize with bit = 16
Quantized activation = [-32768, -23406, -14044, -4681]
Quantized weight = [4681, 14043, 23405, 32767]
Quantized output = -964159613
Dequantize output with min = 1, max = 8 = -102980

The calculation makes sense to me, because the output involves multiplying activations and weigths, their magnitude increase is also multiplied together. If I perform dequantization once with the original min and max, it is reasonable to have a much larger output.

How does Pytorch handle dequantization? I attempted to locate the quantization of Pytorch, but I could not locate it. How to dequantize the output?

1

There are 1 answers

2
deramos On

I think there may be an issue with your formula for calculating the dequantized output.

import numpy as np

# Original values
activation = np.array([1, 2, 3, 4])
weight = np.array([5, 6, 7, 8])

# Quantization parameters
bit = 16  # Desired bit precision
min_val = min(np.min(activation), np.min(weight))
max_val = max(np.max(activation), np.max(weight))

# Calculate scale factor
scale_factor = (2 ** (bit - 1) - 1) / max(abs(min_val), abs(max_val))

# Quantize activation and weight values
quantized_activation = np.round(activation * scale_factor).astype(np.int16)
quantized_weight = np.round(weight * scale_factor).astype(np.int16)

# Dequantize activation and weight values
dequantized_activation = quantized_activation / scale_factor
dequantized_weight = quantized_weight / scale_factor

# Print values
print("Original activation:", activation)
print("Original weight:", weight)
print("Minimum value:", min_val)
print("Maximum value:", max_val)
print("Scale factor:", scale_factor)
print("Quantized activation:", quantized_activation)
print("Quantized weight:", quantized_weight)
print("Dequantized activation:", dequantized_activation)
print("Dequantized weight:", dequantized_weight)

---------------------------------------------------------

Original activation: [1 2 3 4]
Original weight: [5 6 7 8]
Minimum value: 1
Maximum value: 8
Scale factor: 4095.875
Quantized activation: [ 4096  8192 12288 16384]
Quantized weight: [20479 24575 28671 32767]
Dequantized activation: [1.00003052 2.00006104 3.00009156 4.00012207]
Dequantized weight: [4.99990844 5.99993896 6.99996948 8.        ]

Calculate output:

output = np.sum(dequantized_activation * dequantized_weight)
print("Dequantized output:", output) # 70.00183110125477