0

I want to write a simple OpenCL kernel, but when I read some of the data in the kernel, I get completely different values. The following kernel is a small reproduction.

__kernel void kernel( int N, __global float *a, __global float *b, __global float *out ) {               
                    int t_idx = get_global_id(0);                                                                                          
                    size_t gridDim = get_num_groups(0);                                                                                    
                    size_t blockDim = get_local_size(0);                                                                                    
                    for (int t_i = t_idx; t_i < N; t_i += gridDim * blockDim) {                                                    
                        out[t_i] = 2;                                                                                                    
                    }  
                                                                                                                                        
                    if (t_idx == 0) {                                                                                                      
                        out[5] = a[5];                                                                                                 
                    }
}                                                                                      

The value ais a npy-arrays with double precision values. Since my device does not support cl_khr_int64 I have to cast it to 32 bit values, before passing it to the kernel:

for (int i = 0; i < size_a; i++) a[i] = static_cast<float>(a[i]);

However, when reading the 6th value of a in the kernel above, I get different values when reading out after the device-to-host data transfer. When I read values from c, which is a float array I did not fetch (or cast), but which I allocated on my host site, everything works fine.

The host-to-device transfer:

err = clEnqueueWriteBuffer(commands, a_d, CL_TRUE, 0, sizeof(float) * size_a, a, 0, NULL, NULL);
if (err != CL_SUCCESS) return 0;

Im unfortunately out of ideas why the values are wrong as soon I pass them to the kernel. Im thankful for any hints.

1

1 Answer 1

1

Two major issues I can spot right away:

1. Type system

The value ais a npy-arrays with double precision values. Since my device does not support cl_khr_int64 I have to cast it to 32 bit values, before passing it to the kernel:

for (int i = 0; i < size_a; i++) a[i] = static_cast<float>(a[i]);

What you're describing does not match what you're doing in your code - it seems you might have a misunderstanding of C++'s type system. Assuming a is declared double* a;, the behaviour of the following statement is, roughly:

a[i] = static_cast<float>(a[i]);
  • Read element i offset from a, a double precision floating point number.
  • Convert this to a single-precision floating-point number. (static_cast)
  • In order to store the result back into a[i], convert the single-precision result into a double. (Because that's what a says the memory it points to should be treated as.)
  • You're then apparently treating the memory at a, an array of doubles as an array of floats in your kernel. This almost certainly won't do what you want.

2. Concurrent writes

Given the kernel code

                for (int t_i = t_idx; t_i < N; t_i += gridDim * blockDim) {                                                    
                    out[t_i] = 2;                                                                                                    
                }  

                if (t_idx == 0) {                                                                                                      
                    out[5] = a[5];                                                                                                 
                }

So, for t_idx == 0:

out[0] = 2; 
…
out[5] = a[5];

and for t_idx == 5:

out[5] = 2; 

Two threads are writing to the same memory location out[5], with nothing to serialise the writes. The behaviour of this is undefined.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.