5,781 questions
1
vote
0
answers
39
views
OnnxRuntime with ACL Execution Provider on RK3588 (Mali-G610): Nodes assigned to ACL but GPU load remains 0%
[Goal & Problem]
I am trying to accelerate ONNX model inference on an RK3588 (Orange Pi 5) board using the Mali-G610 GPU. I have built OnnxRuntime (ORT) with the ACL (Compute Library) Execution ...
0
votes
0
answers
49
views
How to Use OpenCL in Exynos2400 in termux?
I want to compile and run openCL programs to do some parallel computing on my mobile device s24fe(Exynos2400e) I tried to compile clinfo but it always returns 0 in no of devices
I tried various ...
1
vote
1
answer
96
views
Local atomics causes GPU to crash
I am writing a OpenCL kernel that uses atomics. As I only need to synchronize groups of 192 threads, I figured using local atomics would be ideal. However, the change from global to local atomics ...
0
votes
0
answers
17
views
Can clEnqueueSVMMap be used with a sub-region of an SVM memory region?
Suppose I've allocated a region of memory with clSVMAlloc(). Looking at the clEnqueueSVMMap() function, we are told that it will "allow the host to update a region of a SVM buffer".
Does ...
0
votes
0
answers
16
views
How can I determine why clSVMAlloc failed?
Most OpenCL API calls return a status/error value, either directly or via an out-parameter (example: clCreateBuffer()). While that is not as informative as a long-form string description, it can tell ...
0
votes
1
answer
36
views
Why is clEnqueueWaitForEvents deprecated? It seems indispensible
I'm looking at the clEnqueueWaitForEvents() OpenCL API function.
As I see it, this is a real boon. You see, almost all clEnqueueXXX functions take an array-of-events, and the size of that array, to ...
1
vote
1
answer
39
views
Why can't I create a kernel (CL_INVALID_PROGRAM_EXECUTABLE) after successfully compiling an OpenCL program?
In the following program, I compile a kernel for the first device on the first platform:
const char* kernel_source_code = R"(
__kernel void vectorAdd(
__global float * __restrict C,
...
0
votes
0
answers
42
views
SIGV on clGetPlatformIDs
There is an SIGV wile calling clGetPlatformIDs, which is sometimes a fatal SIGSEGV.
Minimum exemple I found producing it:
#include <stdio.h>
#include <stdlib.h>
#include "CL/cl.h"...
0
votes
1
answer
43
views
What happens when you set the same OpenCL callback more than once on the same object?
OpenCL has several API functions to set callback functions - for events, for buffers/memory objects, for contexts and maybe more.
What happens if you invoke one of these functions, more than once, on ...
0
votes
0
answers
65
views
Why is my OpenCL optimized convolution kernel slower than the naive version at higher workgroup sizes?
I'm working on a GPU-accelerated 2D convolution in OpenCL for a 2048x2048 image using a 3x3 Sobel filter. I implemented two versions of the kernel:
A naive version that uses only global memory.
An ...
0
votes
1
answer
44
views
Does pyopencl transfer arrays to host memory implicitly?
I have AMD GPU. I'm using pyopencl. I have a context and a queue. Then I created an array:
import pyopencl
import pyopencl.array
ctx = pyopencl.create_some_context(interactive=False)
queue = pyopencl....
0
votes
1
answer
49
views
OpenCL 2.0 full profile, without atomic_store & atomic_load? Is this possible?
I use the OpenCL.NET C# wrapper for OpenCL.
My GPU from GPU-Z is AMD Radeon Barcelo, and specific for OpenCL:
Platform Version: OpenCL 2.1 AMD-APP (3570.0)
Device Name: gfx90c
Device Profile: ...
2
votes
1
answer
68
views
OpenCL segfault on clBuildProgram
I'm building my first OpenCL program in C in order to quickly compute the Mandelbrot Set, and I'm getting a segfault on clBuildProgram. Below is the relevant code:
The kernel code (though the program ...
0
votes
1
answer
29
views
Why does clLinkProgram take a context handle?
In OpenCL, the clLinkProgram() function takes (among other things)
A cl_context context handle;
An array of cl_program handles of program objects.
Now, a cl_program is always created in a context; ...
1
vote
0
answers
26
views
Is MPI necessary for invoking OpenCL devices across multiple compute nodes?
What is the typical way of invoking multiple OpenCL devices for multiple compute nodes that uses job schedulers such as SLURM or PBS? Let's say I requested 64 GPUs in total where each computes node is ...
0
votes
0
answers
37
views
What is the guaranteed relation of the per-binary and overall return status of clCreateProgramWithBinaries?
The OpenCL API has the following function:
cl_program clCreateProgramWithBinary(
cl_context context,
cl_uint num_devices,
const cl_device_id* device_list,
const size_t* lengths,
...
0
votes
4
answers
1k
views
How do I use OpenCL in a docker container
I have successfully used OpenCL on my local windows PC and I would now like to get my program working in a container
First attempt
FROM ubuntu:latest
RUN apt-get update
#Done to make install non-...
0
votes
0
answers
6
views
Why does clCompileProgram take "headers" wrapped in cl_program's?
In OpenCL, when you want to compile (not link) a kernel for some target devices, you call:
cl_int clCompileProgram(
cl_program program,
cl_uint num_devices,
const cl_device_id* device_list,...
2
votes
1
answer
63
views
What happens to set kernel arguments after launch? Must I reset them?
In CUDA, launching a kernel means specifying its arguments, marshaled via an array of pointers:
CUresult cuLaunchKernel (
CUfunction f,
/* launch config stuff */,
void** kernelParams,
...
0
votes
0
answers
16
views
In OpenCL, do contexts keep subdevices alive?
In OpenCL (let's say v3.0), I know one can create contexts using sub-devices. But - what happens if you release all references to a sub-device while the context is not released (i.e. has positive ...
1
vote
0
answers
14
views
In OpenCL, can we obtain the default queue for a device (in a context)?
The OpenCL API defines such a thing as the "default queue", for a given context and device in that context. Indeed, when we clCreateCommandQueueWithProperties, one of the properties we ...
0
votes
0
answers
66
views
Can I write OpenCL kernels using some kind of C++, to run on NVIDIA GPUs, in 2024?
OpenCL has had a bumpy ride over the years w.r.t. to the prospects of using C++ to write kernels. First there was "OpenCL C++ kernel language", standardized with OpenCL v2.1 - but that did ...
1
vote
0
answers
38
views
OpenCL: Kernel only reading first pixel
Grayscale kernel only reading first pixel
The following is my grayscale.cl kernel implementation. The problem that I am facing is that the kernel seems to perform the grayscale calculation only on the ...
0
votes
1
answer
84
views
Effect of distance between CUDA threads in block?
I have a naive question about GPU programming. (ChatGPT and Claude didn't really give me a convincing answer. Maybe I'm prompting badly.)
GPU programming languages like CUDA and OpenCL organise ...
1
vote
1
answer
128
views
Ubuntu OpenCL can't find Intel GPU on double GPU device
I'm trying to code an opencl C++ application on an old Ubuntu laptop. It has two GPU's which are shown when I run lspci | grep VGA:
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core ...