Newest 'opencl' Questions

1 vote

0 answers

39 views

OnnxRuntime with ACL Execution Provider on RK3588 (Mali-G610): Nodes assigned to ACL but GPU load remains 0%

[Goal & Problem] I am trying to accelerate ONNX model inference on an RK3588 (Orange Pi 5) board using the Mali-G610 GPU. I have built OnnxRuntime (ORT) with the ACL (Compute Library) Execution ...

이호연

11

asked Nov 18 at 9:10

0 votes

0 answers

49 views

How to Use OpenCL in Exynos2400 in termux?

I want to compile and run openCL programs to do some parallel computing on my mobile device s24fe(Exynos2400e) I tried to compile clinfo but it always returns 0 in no of devices I tried various ...

Lakshit Karsoliya

25

asked Oct 25 at 3:24

1 vote

1 answer

96 views

Local atomics causes GPU to crash

I am writing a OpenCL kernel that uses atomics. As I only need to synchronize groups of 192 threads, I figured using local atomics would be ideal. However, the change from global to local atomics ...

Edward Murphy

69

asked Sep 16 at 2:05

0 votes

0 answers

17 views

Can clEnqueueSVMMap be used with a sub-region of an SVM memory region?

Suppose I've allocated a region of memory with clSVMAlloc(). Looking at the clEnqueueSVMMap() function, we are told that it will "allow the host to update a region of a SVM buffer". Does ...

einpoklum

137k

asked Jul 14 at 10:35

0 votes

0 answers

16 views

How can I determine why clSVMAlloc failed?

Most OpenCL API calls return a status/error value, either directly or via an out-parameter (example: clCreateBuffer()). While that is not as informative as a long-form string description, it can tell ...

einpoklum

137k

asked Jun 30 at 19:02

0 votes

1 answer

36 views

Why is clEnqueueWaitForEvents deprecated? It seems indispensible

I'm looking at the clEnqueueWaitForEvents() OpenCL API function. As I see it, this is a real boon. You see, almost all clEnqueueXXX functions take an array-of-events, and the size of that array, to ...

einpoklum

137k

asked Jun 9 at 22:52

1 vote

1 answer

39 views

Why can't I create a kernel (CL_INVALID_PROGRAM_EXECUTABLE) after successfully compiling an OpenCL program?

In the following program, I compile a kernel for the first device on the first platform: const char* kernel_source_code = R"( __kernel void vectorAdd( __global float * __restrict C, ...

einpoklum

137k

asked Jun 2 at 17:15

0 votes

0 answers

42 views

SIGV on clGetPlatformIDs

There is an SIGV wile calling clGetPlatformIDs, which is sometimes a fatal SIGSEGV. Minimum exemple I found producing it: #include <stdio.h> #include <stdlib.h> #include "CL/cl.h"...

LentilesGR

1

asked May 17 at 19:00

0 votes

1 answer

43 views

What happens when you set the same OpenCL callback more than once on the same object?

OpenCL has several API functions to set callback functions - for events, for buffers/memory objects, for contexts and maybe more. What happens if you invoke one of these functions, more than once, on ...

einpoklum

137k

asked May 5 at 11:33

0 votes

0 answers

65 views

Why is my OpenCL optimized convolution kernel slower than the naive version at higher workgroup sizes?

I'm working on a GPU-accelerated 2D convolution in OpenCL for a 2048x2048 image using a 3x3 Sobel filter. I implemented two versions of the kernel: A naive version that uses only global memory. An ...

Mxneeb

19

asked May 1 at 23:07

0 votes

1 answer

44 views

Does pyopencl transfer arrays to host memory implicitly?

I have AMD GPU. I'm using pyopencl. I have a context and a queue. Then I created an array: import pyopencl import pyopencl.array ctx = pyopencl.create_some_context(interactive=False) queue = pyopencl....

haael

1,059

asked Apr 9 at 18:21

0 votes

1 answer

49 views

OpenCL 2.0 full profile, without atomic_store & atomic_load? Is this possible?

I use the OpenCL.NET C# wrapper for OpenCL. My GPU from GPU-Z is AMD Radeon Barcelo, and specific for OpenCL: Platform Version: OpenCL 2.1 AMD-APP (3570.0) Device Name: gfx90c Device Profile: ...

Chameleon

2,239

asked Mar 10 at 19:37

2 votes

1 answer

68 views

OpenCL segfault on clBuildProgram

I'm building my first OpenCL program in C in order to quickly compute the Mandelbrot Set, and I'm getting a segfault on clBuildProgram. Below is the relevant code: The kernel code (though the program ...

Lemma

143

asked Mar 5 at 20:19

0 votes

1 answer

29 views

Why does clLinkProgram take a context handle?

In OpenCL, the clLinkProgram() function takes (among other things) A cl_context context handle; An array of cl_program handles of program objects. Now, a cl_program is always created in a context; ...

einpoklum

137k

asked Feb 24 at 22:24

1 vote

0 answers

26 views

Is MPI necessary for invoking OpenCL devices across multiple compute nodes?

What is the typical way of invoking multiple OpenCL devices for multiple compute nodes that uses job schedulers such as SLURM or PBS? Let's say I requested 64 GPUs in total where each computes node is ...

Redshoe

301

asked Feb 15 at 19:01

0 votes

0 answers

37 views

What is the guaranteed relation of the per-binary and overall return status of clCreateProgramWithBinaries?

The OpenCL API has the following function: cl_program clCreateProgramWithBinary( cl_context context, cl_uint num_devices, const cl_device_id* device_list, const size_t* lengths, ...

einpoklum

137k

asked Feb 15 at 10:47

0 votes

4 answers

1k views

How do I use OpenCL in a docker container

I have successfully used OpenCL on my local windows PC and I would now like to get my program working in a container First attempt FROM ubuntu:latest RUN apt-get update #Done to make install non-...

sav

2,170

asked Feb 6 at 6:58

0 votes

0 answers

6 views

Why does clCompileProgram take "headers" wrapped in cl_program's?

In OpenCL, when you want to compile (not link) a kernel for some target devices, you call: cl_int clCompileProgram( cl_program program, cl_uint num_devices, const cl_device_id* device_list,...

einpoklum

137k

asked Jan 20 at 15:25

2 votes

1 answer

63 views

What happens to set kernel arguments after launch? Must I reset them?

In CUDA, launching a kernel means specifying its arguments, marshaled via an array of pointers: CUresult cuLaunchKernel ( CUfunction f, /* launch config stuff */, void** kernelParams, ...

einpoklum

137k

asked Jan 18 at 14:39

0 votes

0 answers

16 views

In OpenCL, do contexts keep subdevices alive?

In OpenCL (let's say v3.0), I know one can create contexts using sub-devices. But - what happens if you release all references to a sub-device while the context is not released (i.e. has positive ...

einpoklum

137k

asked Jan 16 at 16:02

1 vote

0 answers

14 views

In OpenCL, can we obtain the default queue for a device (in a context)?

The OpenCL API defines such a thing as the "default queue", for a given context and device in that context. Indeed, when we clCreateCommandQueueWithProperties, one of the properties we ...

einpoklum

137k

asked Jan 7 at 15:32

0 votes

0 answers

66 views

Can I write OpenCL kernels using some kind of C++, to run on NVIDIA GPUs, in 2024?

OpenCL has had a bumpy ride over the years w.r.t. to the prospects of using C++ to write kernels. First there was "OpenCL C++ kernel language", standardized with OpenCL v2.1 - but that did ...

einpoklum

137k

asked Dec 26, 2024 at 11:57

1 vote

0 answers

38 views

OpenCL: Kernel only reading first pixel

Grayscale kernel only reading first pixel The following is my grayscale.cl kernel implementation. The problem that I am facing is that the kernel seems to perform the grayscale calculation only on the ...

Arief Kurniawan

53

asked Dec 15, 2024 at 21:56

0 votes

1 answer

84 views

Effect of distance between CUDA threads in block?

I have a naive question about GPU programming. (ChatGPT and Claude didn't really give me a convincing answer. Maybe I'm prompting badly.) GPU programming languages like CUDA and OpenCL organise ...

Martin Berger

1,128

asked Dec 6, 2024 at 18:30

1 vote

1 answer

128 views

Ubuntu OpenCL can't find Intel GPU on double GPU device

I'm trying to code an opencl C++ application on an old Ubuntu laptop. It has two GPU's which are shown when I run lspci | grep VGA: 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core ...

Turgut

859

asked Oct 31, 2024 at 10:11

Collectives™ on Stack Overflow

OnnxRuntime with ACL Execution Provider on RK3588 (Mali-G610): Nodes assigned to ACL but GPU load remains 0%

How to Use OpenCL in Exynos2400 in termux?

Local atomics causes GPU to crash

Can clEnqueueSVMMap be used with a sub-region of an SVM memory region?

How can I determine why clSVMAlloc failed?

Why is clEnqueueWaitForEvents deprecated? It seems indispensible

Why can't I create a kernel (CL_INVALID_PROGRAM_EXECUTABLE) after successfully compiling an OpenCL program?

SIGV on clGetPlatformIDs

What happens when you set the same OpenCL callback more than once on the same object?

Why is my OpenCL optimized convolution kernel slower than the naive version at higher workgroup sizes?

Does pyopencl transfer arrays to host memory implicitly?

OpenCL 2.0 full profile, without atomic_store & atomic_load? Is this possible?

OpenCL segfault on clBuildProgram

Why does clLinkProgram take a context handle?

Is MPI necessary for invoking OpenCL devices across multiple compute nodes?

What is the guaranteed relation of the per-binary and overall return status of clCreateProgramWithBinaries?

How do I use OpenCL in a docker container

Why does clCompileProgram take "headers" wrapped in cl_program's?

What happens to set kernel arguments after launch? Must I reset them?

In OpenCL, do contexts keep subdevices alive?

In OpenCL, can we obtain the default queue for a device (in a context)?

Can I write OpenCL kernels using some kind of C++, to run on NVIDIA GPUs, in 2024?

OpenCL: Kernel only reading first pixel

Effect of distance between CUDA threads in block?

Ubuntu OpenCL can't find Intel GPU on double GPU device

Hot Network Questions