Modify the PyTorch code to call GEMM of INT8

133 views Asked by At

I would like to know how to use cublasGemmEx to infer a .pth model trained by pytorch with int8 quantization.

I tried torch.quantization.quantize_dynamic and it seems that it doesn't work on the CUDA. I also tried to convert model to onnx, but it runs very slow, and "] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance." warning was thrown.

0

There are 0 answers