Torch cannot find cudnn_adv_train64_8.dll while building Tensor RT Engine for trt-llm-rag-windows

54 views Asked by At

I am trying to install trt-llm-rag-windows following the guide on its github repo and encountered the issue while trying to build the trt engine with the following command:

python build.py --model_dir <path to llama13_chat model> --quant_ckpt_path <path to model.pt> --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir <TRT engine folder>

Stacktrace:

Traceback (most recent call last):
  File "C:\Users\user\inference\TensorRT-LLM\examples\llama\build.py", line 22, in <module>
    import torch
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\__init__.py", line 129, in <module>
    raise err
OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\lib\cudnn_adv_train64_8.dll" or one of its dependencies.

Sideinfo (Environment):

  • gpu: NVIDIA GeForece RTX 4070 Ti
  • python: 3.10
    • pip: 24.0
    • tensorrt: 9.2.0.post12.dev5
    • torch: 2.1.0+cu121
    • fsspec: 2023.5.0
  • TensorRT: TensorRT-9.1.0.4.Windows10.x86_64.cuda-12.2.llm.beta
  • Cuda: cuda_12.2.2_537.13_windows
  • cuDNN: cudnn-windows-x86_64-8.9.7.29_cuda12-archive
  • LLM: Llama-2-13b-chat-hf
  • TensorRT-LLM: release/0.5.0

I checked:

  • package folder for the missing dll -> it exists
  • import torch directly via python3.10 console -> works (including the "missing" dll)
  • import tensorrt_llm; print(tensorrt_llm._utils.trt_version()) -> works and returns "[TensorRT-LLM] TensorRT-LLM version: 0.8.09.2.0.post12.dev5"
  • force reinstall torch via "python3.10 -m pip install --force-reinstall torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121" --> works, has no effect

Additional context: I encountered multiple version conflicts while following the guide, including downloading a newer Version of cudnn (cudnn64_9). But could fix all of them. Curious is that the python package does not seem to be broken, cause it works when imported directly.

Did anyone encounter a similar issue and can shed some light on it?

Edit: Fixed formatting

0

There are 0 answers