Modify the PyTorch code to call GEMM of INT8

129 views Asked by 火星大王 At 30 June 2023 at 16:58

I would like to know how to use cublasGemmEx to infer a .pth model trained by pytorch with int8 quantization.

I tried torch.quantization.quantize_dynamic and it seems that it doesn't work on the CUDA. I also tried to convert model to onnx, but it runs very slow, and "] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance." warning was thrown.

Original Q&A

TechQA.

Modify the PyTorch code to call GEMM of INT8

There are 0 answers

Related Questions in PYTORCH

Related Questions in ONNX

Related Questions in QUANTIZE

Popular Questions

Popular Tags

Trending Questions