Pytorch ROCm fails to train even though it's setup correctly

Question

Pytorch ROCm fails to train even though it's setup correctly

409 views Asked by Skarred At 24 October 2023 at 09:36

I have installed pytorch on my Arch machine using the recommended snippet from the official site:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

Set the following environment variables:

export PYTORCH_ROCM_ARCH="gfx1031"
export HSA_OVERRIDE_GFX_VERSION=10.3.1
export HIP_VISIBLE_DEVICES=0
export ROCM_PATH=/opt/rocm
export HIP_PLATFORM=amd
export HIP_DEVICE=0

and running this test script yields the following status of pytorch:

Checking ROCM support...
GOOD: ROCM devices found:  2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user enchilada is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of:
--->  Intel(R) Core(TM) i7-10700KF CPU @ 3.80GHz
--->  gfx1031

After all of that, if i try to train a model on the gpu, i get the error:

File "/storage/Programs/PythonVenvs/transformers-rocm/lib/python3.11/site-packages/torch/nn/init.py", line 19, in _no_grad_normal_
    return tensor.normal_(mean, std)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function

I am using a slightly modified version of TinyLlama's finetune script, which runs perfectly on cpu, so i believe that my rocm setup is at fault.

I would really appreciate any tips as to how i can solve this.

rocminfo output (i deleted the cpu agent part to make it shorter):

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE

*******
Agent 2
*******
  Name:                    gfx1031
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon RX 6700 XT
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      3072(0xc00) KB
    L3:                      98304(0x18000) KB
  Chip ID:                 29663(0x73df)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2855
  BDFID:                   768
  Internal Node ID:        1
  Compute Unit:            40
  SIMDs per CU:            2
  Shader Engines:          2
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    12566528(0xbfc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1031
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

Original Q&A

TechQA.

Pytorch ROCm fails to train even though it's setup correctly

There are 0 answers

Related Questions in PYTORCH

Related Questions in LLAMA

Related Questions in AMD-ROCM

Popular Questions

Popular Tags

Trending Questions