I am trying to compare a simple addition task with both CPU and GPU, but the results that I get are so weird.
First of all, let me explain how I managed to run the GPU task.
Let's dive into code now this is my code it simply
package gpu;
import com.aparapi.Kernel;
import com.aparapi.Range;
public class Try {
public static void main(String[] args) {
final int size = 512;
final float[] a = new float[size];
final float[] b = new float[size];
for (int i = 0; i < size; i++) {
a[i] = (float) (Math.random() * 100);
b[i] = (float) (Math.random() * 100);
}
//##############CPU-TASK########################
long start = System.nanoTime();
final float[] sum = new float[size];
for(int i=0;i<size;i++){
sum[i] = a[i] + b[i];
}
long finish = System.nanoTime();
long timeElapsed = finish - start;
//######################################
//##############GPU-TASK########################
final float[] sum2 = new float[size];
Kernel kernel = new Kernel(){
@Override public void run() {
int gid = getGlobalId();
sum2[gid] = a[gid] + b[gid];
}
};
long start1 = System.nanoTime();
kernel.execute(Range.create(size));
long finish2 = System.nanoTime();
long timeElapsed2 = finish2 - start1;
//##############GPU-TASK########################
System.out.println("cpu"+timeElapsed);
System.out.println("gpu"+timeElapsed2);
kernel.dispose();
}
}
My specs are:
Aparapi is running on an untested OpenCL platform version: OpenCL 3.0 CUDA 11.6.13
Intel Core i7 6850K @ 3.60GHz Broadwell-E/EP 14nm Technology
2047MB NVIDIA GeForce GTX 1060 6GB (ASUStek Computer Inc)
The results that I get are this:
cpu12000
gpu5732829900
My question is why the performance of GPU is so slow. Why does CPU outperform GPU? I expect from GPU to be faster than the CPU does, my calculations are wrong, any way to improve it?
This code is measured the host side execution time for GPU task. It means that the measured time includes the time of the task execution on GPU, the time of copying the data for the task to GPU, the time of reading the data from GPU and the overhead that is introduced by Aparapi. And, according to the documentation for
Kernelclass, Aparapi uses lazy initialization:Therefore, the host side execution time for GPU task cannot be compared with the execution time for CPU task. Because it includes additional work that is performed only once.
In this case, it is necessary to use
getProfileInfo()call to get the execution time breakdown for the kernel:Also, please note that the following property must be set:
-Dcom.aparapi.enableProfiling=true. For more information please see Profiling the Kernel article and the implementation of ProfileInfo class.