I am recently learning to write a prefix sum algorithm with CUDA. I had a stupid bug where I assigned a float variable to an integer variable losing precision:
// Phase 3: populate last element o previous subsection.
__shared__ float XY[SECTION_SIZE];
// ...
const int prev_sec_sum = ((subsection_idx > 0) ? XY[subsection_idx - 1] : 0.0f);
// ^ Should be float here.
for (size_t i = 0; i < COARSEN_FACTOR - 1; i++) {
XY[subsection_idx + i] += prev_sec_sum;
}
I should have been more careful about this, but I am surprised that nvcc does not warn about the conversion. I did a search then I realized that the following compiler flag is only for the host code:
--compiler-options -Wall,-Wextra,-Wconversion
After searching through the nvcc document, the only compiler flag I can find is --Werror all-warnings which does not generate any warning for this conversion. Do you know if nvcc supports this conversion checking? For example, do we have a -Wconversion-like flag for the device code?
I tired:
nvcc -o bin/prefix_sum prefix_sum.cu --compiler-options -Wall,-Wextra,-Wconversion --Werror all-warnings --compiler-bindir /usr/bin/gcc-12 -arch=native
But it only check conversion error for the host code.
Thanks for your help.
Update:
Although not through nvcc, with the help from @paleonix, I can get a warning/error for it when building it with clang. Please see my post below.
I wonder if this means that only experts should use nvcc to build the program given that it lacks some basic checking functionality. My speculation is that as the native CUDA kit, it should generate faster code than clang using more optimization techniques.
To answer my own question and provide some reference for whom just started learning CUDA programming.
Thanks to the help from @paleonix, I used clang to build my CUDA program.
With build option
-Weverything -Werror, I am now able to get the-Wimplicit-int-float-conversionerror/warning from it.For this code snippet:
I used the following command:
This is the result:
which is exactly what I want.