I have the following start of a code
import numpy as np
from pycuda import driver, gpuarray
from pycuda.compiler import SourceModule
import pycuda.autoinit
MATRIX_SIZE = 3
matrix_mul_kernel = """
__global__ void Matrix_Mul_Kernel(float *d_a, float *d_b, float *d_c)
{
int tx = threadIdx.x;
int ty = threadIdx.y;
float value = 0;
int s=5;
printf("X %d Y \\n",s);
for (int i = 0; i < %(MATRIX_SIZE)s; ++i) {
float d_a_element = d_a[ty * %(MATRIX_SIZE)s + i];
float d_b_element = d_b[i * %(MATRIX_SIZE)s + tx];
value += d_a_element * d_b_element;
}
d_c[ty * %(MATRIX_SIZE)s + tx] = value;
} """
matrix_mul = matrix_mul_kernel % {'MATRIX_SIZE': MATRIX_SIZE}
mod = SourceModule(matrix_mul)
The part inside the kernel with printf, if I do printf("hello"); it goes fine but when trying to print an integer (I was trying to print tx and ty but never mind, any would be fine) an error appears
Traceback (most recent call last):
File "/media/cbe421fe-1303-4821-9392-a849bfdd00e2/MyStudy/PyCuda/9_matrix_mul.py", line 26, in <module>
matrix_mul = matrix_mul_kernel % {'MATRIX_SIZE': MATRIX_SIZE}
TypeError: %d format: a number is required, not dict
Why is this code failing?
Previously when no constant was used, I could print the thread x and y
EDIT: Even stranger when I do this
printf("X %s Y \\n",5);
It does not fail but prints this
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
X {'MATRIX_SIZE': 3} Y
So apparently no matter the variable it is always interpreted as the dictionary {'MATRIX_SIZE': 3} therefore the error. The question is why?
what is happening here?
The issue is that your printf call uses the same string interpolation specifier (%d) used by python's string interpolation. From Python's documentation:
To avoid mixing python and cuda's string interpolation, you can use python's newer string formatting.
Wherever you need MATRIX_SIZE, use
{MATRIX_SIZE}