0

I'm trying to learn OpenCL on a Mac, which appears to have some differences in implementation from the OpenCL book I'm reading. I want to be able to dynamically allocate local memory on the GPU. What I'm reading is I need to use the clSetKernelArg function, but that doesn't work within Xcode 6.4. Here's the code as it stands (never mind it's a pointless program, just trying to learn the syntax for shared memory). In Xcode, the kernel is written as a stand-alone .cl file similar to CUDA, so that's a separate file.

add.cl:

kernel void add(int a, int b, global int* c, local int* d)
{
    d[0] = a;
    d[1] = b;
    *c = d[0] + d[1];
}

main.c:

#include <stdio.h>
#include <OpenCL/opencl.h>
#include "add.cl.h"

int main(int argc, const char * argv[]) {

    int a = 3;
    int b = 5;
    int c;
    int* cptr = &c;

    dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);

    void* dev_c = gcl_malloc(sizeof(cl_int), NULL, CL_MEM_WRITE_ONLY);

    // attempt to create local memory buffer
    void* dev_d = gcl_malloc(2*sizeof(cl_int), NULL, CL_MEM_READ_WRITE); 
    // clSetKernelArg(add_kernel, 3, 2*sizeof(cl_int), NULL);

    dispatch_sync(queue, ^{

        cl_ndrange range = { 1, {0, 0, 0}, {1, 0, 0}, {1, 0, 0} };

        // This gives a warning: 
        // Warning: Incompatible pointer to integer conversion passing 'cl_int *' 
        //     (aka 'int *') to parameter of type 'size_t' (aka 'unsigned long')
        add_kernel(&range, a, b, (cl_int*)dev_c, (cl_int*)dev_d);

        gcl_memcpy((void*)cptr, dev_c, sizeof(cl_int));

    });

    printf("%d + %d = %d\n", a, b, c);

    gcl_free(dev_c);    
    dispatch_release(queue);
    return 0;
}

I've tried putting clSetKernelArg where indicated and it doesn't like the first argument:

Error: Passing 'void (^)(const cl_ndrange *, cl_int, cl_int, cl_int *, size_t)' to parameter of incompatible type 'cl_kernel' (aka 'struct _cl_kernel *')

I've looked and looked but can't find any examples illustrating this point within the Xcode environment. Can you point me in the right direction?

3
  • I should clarify, the warning in the add_kernel call refers to the last argument (dev_d). Commented Sep 11, 2015 at 22:50
  • Where is add_kernel defined? My guess would be in add.cl.h but you haven't shown us that. Also .cl files are not Xcode specific, you could do that with any compiler/IDE. You could also write the OpenCL kernel code inline if you wanted to, as its done in the Hello World Example Commented Sep 11, 2015 at 23:06
  • add_kernel is defined by the Xcode cl compiler. The code is in add.cl and the compiler creates the add_kernel function by appending the "_kernel". Commented Sep 14, 2015 at 1:16

2 Answers 2

1

Managed to solve this by ditching Apple's extensions and using standard OpenCL 1.2 calls. That means replacing gcl_malloc with clCreateBuffer, replacing dispatch_sync with clEnqueueNDRangeKernel, and most importantly, using clSetKernelArg with NULL in the last argument for local variables. Works like a charm.

Here's the new version:

char kernel_add[1024] =
"kernel void add(int a, int b, global int* c, local int* d) \
{\
    d[0] = a;\
    d[1] = b;\
    *c = d[0] + d[1];\
}";

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <OpenCL/opencl.h>

int main(int argc, const char * argv[]) {

    int a = 3;
    int b = 5;
    int c;

    cl_device_id device_id;
    int err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);

    cl_context context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);    
    cl_command_queue queue = clCreateCommandQueue(context, device_id, 0, &err);

    const char* srccode = kernel;
    cl_program program = clCreateProgramWithSource(context, 1, &srccode, NULL, &err);

    err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
    cl_kernel kernel = clCreateKernel(program, "kernel_add", &err);

    cl_mem dev_c = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int), NULL, NULL);

    err = clSetKernelArg(kernel, 0, sizeof(int), &a);
    err |= clSetKernelArg(kernel, 1, sizeof(int), &b);
    err |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &dev_c);
    err |= clSetKernelArg(kernel, 3, sizeof(int), NULL);

    size_t one = 1;
    err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &one, NULL, 0, NULL, NULL);
    clFinish(queue);

    err = clEnqueueReadBuffer(queue, dev_c, true, 0, sizeof(int), &c, 0, NULL, NULL);

    clReleaseMemObject(dev_c);
    clReleaseKernel(kernel);
    clReleaseProgram(program);
    clReleaseCommandQueue(queue);
    clReleaseContext(context);

    return 0;
}
Sign up to request clarification or add additional context in comments.

Comments

0

In regular OpenCL, for a kernel parameter declared as a local pointer, you don't allocate a host buffer and pass it in (like you're doing with dev_d). Instead you do a clSetKernelArg with the size of the desired local storage but a NULL pointer (like this: clSetKernelArg(kernel, 2, sizeof(cl_int) * local_work_size[0], NULL)). You'll have to translate that into the Xcode way if you insist on being platform-specific.

1 Comment

I tried this, but I don't know where to put this line. I tried a few places, but none of them worked because it didn't recognize the kernel argument. I may have to give up on the Apple implementation and go to the standard one if I want to explicitly use local memory.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.