For my science fair project I have to write a computationally-intensive algorithm that is well suited to parallelization. I have read about OpenCL and CUDA and it seems they are mainly used from C/C++. While it would not be that difficult for me to pick up a bit of C to write a simple main, I was wondering how big the performance hit would be if I used Java or Python bindings for my GPU computation? Specifically, I was more interested in the performance hit using CUDA because that's the framework I'm planning on using.
GPGPU performance in high-level languages
345 views Asked by Elliot Gorokhovsky At
1
There are 1 answers
Related Questions in CUDA
- CUDA matrix inversion
- How can I do a successful map when the number of elements to be mapped is not consistent in Thrust C++
- Subtraction and multiplication of an array with compute-bound in CUDA kernel
- Is there a way to profile a CUDA kernel from another CUDA kernel
- Cuda reduce kernel result off by 2
- CUDA is compatible with gtx 1660ti laptop GPU?
- How can I delete a process in CUDA?
- Use Nvidia as DMA devices is possible?
- How to runtime detect when CUDA-aware MPI will transmit through RAM?
- How to tell CMake to compile all cpp files as CUDA sources
- Bank Conflict Issue in CUDA Shared Memory Access
- NVIDIA-SMI 550.54.15 with CUDA Version: 12.4
- Using CUDA with an intel gpu
- What are the limits on CUDA printf arguments?
- Why do CUDA asynchronous errors occur? (occur on the linux OS)
Related Questions in OPENCL
- What is the parameter for CLI YOLOv8 predict to use Intel GPU?
- How to exploit Unified Memory in OpenCL with CL_MEM_ALLOC_HOST_PTR flag?
- PyOpenCl code hanging on a simple get() - how can I troubleshoot?
- OpenCL dynamic parallelism enqueue_kernel() functionality
- Do all OpenCL drivers come with the IntelOneAPI compiler
- How to move an array of structures to the GPU?
- Passing arguments to OpenCL kernel, before execution finished
- OpenCV acceleration (OpenCL) of gaussian blur
- CL_DEVICE_NOT_AVAILABLE using Intel(R)Xeon(R)Gold 6240 CPU
- Launch Single Kernel on problem space vs Launch same kernel, multiple times on smaller problem spaces
- Running OpenCL programs on baremetal RISC-V core
- Why did an OpenCL rendering optimization make my code slower?
- OpenCL Kernel hangs at clEnqueueReadBuffer on AMD rocm
- Is it possible to assign works to each GPU thread instead of a work to group of GPU threads?
- Fast way to rearrange bit into new byte
Related Questions in GPGPU
- OpenCL dynamic parallelism enqueue_kernel() functionality
- Sign a PGP public key using a private key and password, then save the signed key to a file
- Passing arguments to OpenCL kernel, before execution finished
- CUDA kernel for finding the min and max index of values in a 1D array greater than particular threshold
- Cuda __device__ member function with explicit template declaration
- AMD GPU Compute with c++
- Why is webgpu on mac "max binding size" much smaller than reported "max buffer size"?
- Running multiple times a python script from different threads using different gpus
- GPGPU with Radeon Pro VII in Windows
- Pytorch Memory Management Issue
- Perform vector calculation on GPU in C++, regardless of brand
- Reinterpret cast on *shared memory*
- Can I really launch a library kernel (CUkernel) rather than an in-context kernel (CUfunction)?
- How to use shared memory in PyCuda, LogicError: cuModuleLoadDataEx failed: an illegal memory access was encountered
- What (if anything) is this GPU compute or shader pattern called?
Related Questions in PYCUDA
- How to use shared memory in PyCuda, LogicError: cuModuleLoadDataEx failed: an illegal memory access was encountered
- Using Pycuda: LogicError: cuMemAlloc failed: an illegal memory access
- error when calling a cuda kernel in python using pycuda
- Why is the image being partially processed?
- Why an empty cuda kernel takes more time than a opencv operation on CPU?
- pycuda cannot find the kernel cuModuleGetFunction failed: named symbol not found
- Why I cannot print inside a kernel in Pycuda?
- Why the thread is the same with multiple threads in PyCUDA
- Calling cublasDgetrfBatched to perform LU decomposition failed with Pycuda
- Memory Accesses Make a CUDA Kernel extremely slow
- cuLaunchKernel failed: too many resources requested for launch
- Installing pycuda on windows
- PyCuda C++ kernel "error: this declaration may not have extern "C" linkage"
- Compiling Cuda - nvcc cannot find a supported version of Microsoft Visual Studio
- RuntimeError no CUDA-capable device is detected
Related Questions in JOCL
- JOCL CL_OUT_OF_RESOURCES only if value is assigned
- MVN is not letting me install JOCL
- OpenCL: memory recovery and threading?
- Why would JOCL CL.clEnqueueReadBuffer never return?
- Efficiently synchronously queue many small OpenCL kernels
- copying an image onto another with JOCL/OpenCL
- How to pass an array of structs to kernel with JOCL
- Very odd OpenCL CL_OUT_OF_RESOURCES behavior
- CL_INVALID_MEM_OBJECT error when calling clSetKernelArg in JOCL
- OpenCL & Java - Weird Performance Results
- Releasing Memory Allocated by Native Libraries in Java
- how to create CL GL interop context?
- JOCL Char not returning all chars
- JOCL Program build error when given char array in kernel
- OpenCL double precision error on Surface 3 pro
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
In general, every time you add an abstraction layer you're loosing performance but, in the case of CUDA this is not completely true because, whether using Python or Java you'll end up writing your CUDA kernels on C/Fortran, so the performance in the GPU side will be the same as using C/Fortran (check some pyCUDA examples here)
The bad news it that Java and Python will never achieve the performance of compiled languages such as C on certain tasks, see this SO answer for a more detailed discussion about this topic. Here is a good discussion about C versus Java, also on SO.
There are many questions and discussions about performance comparison between interpreted and compiled languages, so I encourage you to read some of them.