Usually one compute unit can only run one work group. But AMD's doc says there can be more than one wavefronts running on the same compute unit. How can I do that? Is that an OpenCL function for that? Or I need to use assembly instruction? I want to do this because my work group size is 20 and I want to run 2 work groups per compute unit, so that each group can use 32 KiB LDS (64 KiB total per CU, each wavefront can use up to 32KiB so I want to run two wavefronts to use the full amount of LDS).
How to run two work groups per one compute unit on AMD GCN cards
165 views Asked by user1200759 At
0
There are 0 answers
Related Questions in GPU
- A deterministic GPU implementation of fused batch-norm backprop, when training is disabled, is not currently available
- What is the parameter for CLI YOLOv8 predict to use Intel GPU?
- Windows 10 TensorFlow cannot detect Nvidia GPU
- Is there a way to profile a CUDA kernel from another CUDA kernel
- Does Unity render invisible material?
- Quantization 4 bit and 8 bit - error in 'quantization_config'
- Pyarrow: ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found
- How to setup SLI on two GTX 560Ti's
- How can I delete a process in CUDA?
- No GPU EC2 instances associated with AWS Batch
- access fan and it's speed, in linux mint on acer predator helios 300
- Why can CPU memory be specified and allocated during instance creation but not GPU memory on the cloud?
- Why do CUDA asynchronous errors occur? (occur on the linux OS)
- Pytorch how to use num_worker>0 for Dataloader when using multiple gpus
- Running PyTorch MPS acceleration on Apple M1, get "Placeholder storage has not been allocated on MPS device!" error, but all seems to be on device
Related Questions in OPENCL
- What is the parameter for CLI YOLOv8 predict to use Intel GPU?
- How to exploit Unified Memory in OpenCL with CL_MEM_ALLOC_HOST_PTR flag?
- PyOpenCl code hanging on a simple get() - how can I troubleshoot?
- OpenCL dynamic parallelism enqueue_kernel() functionality
- Do all OpenCL drivers come with the IntelOneAPI compiler
- How to move an array of structures to the GPU?
- Passing arguments to OpenCL kernel, before execution finished
- OpenCV acceleration (OpenCL) of gaussian blur
- CL_DEVICE_NOT_AVAILABLE using Intel(R)Xeon(R)Gold 6240 CPU
- Launch Single Kernel on problem space vs Launch same kernel, multiple times on smaller problem spaces
- Running OpenCL programs on baremetal RISC-V core
- Why did an OpenCL rendering optimization make my code slower?
- OpenCL Kernel hangs at clEnqueueReadBuffer on AMD rocm
- Is it possible to assign works to each GPU thread instead of a work to group of GPU threads?
- Fast way to rearrange bit into new byte
Related Questions in AMD-GPU
- OpenCL dynamic parallelism enqueue_kernel() functionality
- WARNING: amdgpu dkms failed for running kernel
- Compiling hip code using hipcc -O0 for AMD GPU
- Accelerated PyTorch for Macbook with AMD GPUS
- Blender and other 3D applications don't launch
- How to compile clang llvm to amd gcn on linux ubuntu
- [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1552686, emitted seq=1552688
- libc6-dev/libc-dev : "Unable to fix problems, bad packets are in “keep as is” mode."
- How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?
- Running pytorch or tensorflow in AMD APU
- How does the Linux DRM GEM work with the TTM in memory management?
- How can I make fragment_shader have a ouput to stencil_attachment?
- GLSL Error: '##' : not supported for these tokens
- hipMemcpy fails to copy
- Linux Stripes on Screen
Related Questions in AMD-GCN
- Data Loading into GCN
- How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?
- How to resolve _pickle.UnpicklingError
- Performance drop in matrix multiplication for certain sizes on AMD Polaris
- In OpenCL, can one take an array containing GCN Assembly and execute it (JIT)?
- What is the best practice for memory access in this N-body problem solved on AMD Radeon RX580?
- SIMD-16 and SIMD-32 advantage/disadvantage?
- How to read and write to Global Data Share in AMD GCN?
- How to compile .cl file that contains inline assembly for GCN cards?
- Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
- How to run two work groups per one compute unit on AMD GCN cards
- OpenCL and AMD GPU Architecture understanding
- V_SUB_F64 in AMD's GCN and VEGA instruction set
- GCM not receiving on ColorOS based devices
- OpenCL (AMD GCN) global memory access pattern for vectorized data: strided vs. contiguous
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)