I want to write a simple OpenCL kernel, but when I read some of the data in the kernel, I get completely different values. The following kernel is a small reproduction.
__kernel void kernel( int N, __global float *a, __global float *b, __global float *out ) {
int t_idx = get_global_id(0);
size_t gridDim = get_num_groups(0);
size_t blockDim = get_local_size(0);
for (int t_i = t_idx; t_i < N; t_i += gridDim * blockDim) {
out[t_i] = 2;
}
if (t_idx == 0) {
out[5] = a[5];
}
}
The value ais a npy-arrays with double precision values. Since my device does not support cl_khr_int64 I have to cast it to 32 bit values, before passing it to the kernel:
for (int i = 0; i < size_a; i++) a[i] = static_cast<float>(a[i]);
However, when reading the 6th value of a in the kernel above, I get different values when reading out after the device-to-host data transfer. When I read values from c, which is a float array I did not fetch (or cast), but which I allocated on my host site, everything works fine.
The host-to-device transfer:
err = clEnqueueWriteBuffer(commands, a_d, CL_TRUE, 0, sizeof(float) * size_a, a, 0, NULL, NULL);
if (err != CL_SUCCESS) return 0;
Im unfortunately out of ideas why the values are wrong as soon I pass them to the kernel. Im thankful for any hints.