I'm testing a OpenCL 2.0 reducing kernel to get a max value of 2d-workgroups.
I call:
//get global max
float max_v;
max_v = work_group_reduce_max(hsv_fi.z);
if(get_local_id(0) == 0 && get_local_id(1) == 0){
result[get_group_id(0)] = 1337;// max_v;
}
at the end of my kernel. The 1337 for test purposes. I expect an array of workgroupsize full of 1337 values.
But i somehow get only one value. Is there anything obvious I'm doing wrong in kernel?
I try to read it via:
int errcode;
float* resultData = (float*)oclEnvironment._commandQueue.enqueueMapBuffer(resultBuffer, true, CL_MAP_READ, 0, size, 0, 0, &errcode);
std::copy((float*)resultData, (float*)(resultData + size), (float*)dest);
my *dest pointer is a:
std::vector<float> resultArray;
which I reserve with the workgroup size and hand over the pointer to it (*dest) by:
resultArray.data()
Any suggestions? I feel like it's something minor that I'm missing.
Thanks !
EDIT:
cl::NDRange global(width, height);
//Intel HD 530 can have a max. workgroup size of 256.
int dim1 = 16;
int dim2 = 16;
cl::NDRange local(dim1, dim2);
//Calculate the number of workgroups
int numberOfWorkgroups = ceil((width * height) / (float)(dim1 * dim2));
//each workgroup reduces the data to a single element. This elements are then reduced on host in the final reduction step.
oclEnvironment._commandQueue.enqueueNDRangeKernel(_kernel, cl::NullRange, global, local);
edit2:
I still don't understand what OpenCL is doing. I have an Image with 480x360 that I want as a buffer with 3 bytes per element (rgb). I get like 30 valid maxima in the list now but not the 675 I suspect. The image has a resolution of 480 * 360. My maximum work group size is (16,16,1). So I get the 675 work groups I thought. Somehow I receive only 30 values back, the others are zeros. I tried to pad the input to 480*368 to have both dimension be multiples of 16. Still the same. It would be okay if those 30 values reduced my whole data set and the max of that list I receive is the max. But since I am not sure whats happening I can't be sure that is the case.
(float*)casts instd::copyfor?sizethe size in bytes or the number of floats? The usage seems inconsistent.enqueueMapBuffertakes the size in number of bytes, not number of items.