385 questions
-3
votes
1
answer
67
views
How to debug cuda kernels in python, using vscode (linux)
I use cupy to call cuda kernels, but I don't know how to debug cuda code, here is my wrapper file:
wrapper.py
import math
from pathlib import Path
import cupy as cp
import numpy as np
with open(Path(...
0
votes
0
answers
86
views
Filter data into binary classes on GPU
I have a ML problem where I want to leverage the power of Support Vector Classifiers (SVC) or any other 2-class classifier and compare them to my NN models.
The probelm is, that binary classifiers are ...
1
vote
1
answer
67
views
CuPy memory management
I am trying to understand how CuPy handles memory. Specifically the difference between used_bytes and total_bytes as shown here
I have a simple code that either directly allocates an array on device ...
0
votes
0
answers
149
views
How to speedup cupy code with frequent data transfer & writing
I have a computational code largely written based on cupy. The computation I'm running now requires high-frequency data transfer (from GPU to CPU) and writing to a h5 file.
Here is a sketch of what I'...
-2
votes
1
answer
499
views
How to GPU-accelerate PDE solvers in Python?
I have been working on a small Python package to solve a class of PDEs using scipy.integrate.solve_ivp. As discretizations are made finer, runtime becomes a bottleneck—especially when I need to solve ...
2
votes
1
answer
148
views
CuPy ndimage convolution in a nested for-loop seems fast but the next execution is stalled
I am trying to write code that convolves a 3D image with a 3D wavelet kernel that can be described using three independent parameters. I want to analyze the results of the convolution for all ...
0
votes
0
answers
68
views
Cupy creating array using other variables
I am trying to make a numpy/cupy interchange script, similar to this backend implementation. Such that by using something like from Util.Backend import backend as bd, I can create bd.array() that can ...
0
votes
1
answer
34
views
cupy.nanargmax throwing exception
I have a 2D array allocated on GPU and I need to use the cuPy's nanargmax() function to find the maximum value's index in each row. Some of the values could be NaN. Since the 2D array is quite large (...
0
votes
1
answer
162
views
CompileException occurs when compile .cu file with cupy
I have a .cu file with these heads:
#include </usr/include/features.h>
#include </usr/include/assert.h>
#include </usr/include/stdio.h>
When I use nvcc command to compile this file, ...
1
vote
1
answer
261
views
Understanding the permutation test
I'm attempting to optimize the performance of the permutation test implemented in scipy.stats. My dataset consists of 500,000 observations, each associated with 2,000 binary covariates. I've applied ...
3
votes
1
answer
167
views
Why (x / y)[i] faster than x[i] / y[i]?
I'm new to CuPy and CUDA/GPU computing. Can someone explain why (x / y)[i] faster than x[i] / y[i]?
When taking advantage of GPU accelerated computations, are there any guidelines that would allow me ...
1
vote
0
answers
45
views
Manual indexing with multidimensional cupy ndarray in user defined kernels
In the cupy docs on user defined kernels (https://docs.cupy.dev/en/stable/user_guide/kernel.html), there is a section defining certain variables that are predefined, like _ind.size() and i for things ...
0
votes
1
answer
343
views
cuSPARSELt not found by CuPy
I have a hard time getting CuPy to detect and use, where applicable, the cuSPARSELt library in Windows. I tried installing versions 0.2.0 (as mentioned by CuPy's installation guide) and 0.6.2 (the ...
2
votes
0
answers
59
views
cupy.linalg.solve for positive definite matrix?
It seems like cupy.linalg.solve doesn't have an option for me to solve linear system Ax=b assumingA is positive definite?
I am looking for something like scipy.linalg.solve where one can actually tell ...
0
votes
0
answers
81
views
Python app using cupy and cupyx fails with cl.exe not found: how to package to work with no cl.exe on target machine
The app is built in Python on Windows 10 and make heavy use of cupy and cupyx.scipy.ndimage, and a few other cupyx libraries: It is distributable and it works. It now needs to go to a more secure ...
0
votes
1
answer
370
views
Cupy copy numpy array to existing device array
I would like to copy a numpy array on an existing, pre-allocated, gpu array.
I've seen that cupy offers the functions copy and copyto, however the former does not allow to specify the destination ...
0
votes
1
answer
542
views
CuPy takes more time to preprocess the image?
import cv2
import numpy as np
import cupy as cp
import time
def op_image(image):
start_time = time.time()
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (640, 480)...
1
vote
0
answers
95
views
Ensure same seed generates same random numbers when using numpy and cupy
I import numpy or cupy as follows:
import numpy as np
# import cupy as np
Then I generate X as follows:
np.random.seed(0)
X = np.random.rand(4, 3)
I get two very different matrices depending on ...
0
votes
0
answers
58
views
How can an application using CuPy be deployed? (VC++ dependency)
I've had no problems using the given documentation to install CuPy and develop with it on my own machine. But I'm seeing a roadblock in deploying applications using CuPy in a commercial setting to ...
1
vote
1
answer
78
views
Runtime Error coccures when using torchsummary
My code is like below.
import numpy as np
import torch
import torch.nn as nn
import cupy as cp
from torchviz import make_dot
from torchinfo import summary
from torchsummary import summary as summary_
...
1
vote
1
answer
1k
views
How to get all available devices for CuPy?
How can I get all available devices for CuPy? I'm looking to write a version of this for CuPy:
if is_torch(xp):
devices = ['cpu']
import torch # type: ignore[import]
num_cuda = torch.cuda....
2
votes
2
answers
120
views
Making masks based on euclidean distance with pyopencl, arrayfire or another python opencl library
I am doing 2D or 3D binary masks around given coordinates and then identifying them as labels with scipy.ndimage.label.
Now, I have a cupy solution, a numpy solution. Cupy is fast, numpy is very slow, ...
1
vote
1
answer
295
views
Using CuPy on Maxwell GPU
Anyone here trying to use cupy on a Maxwell GPU? I am trying to do a simple array.mean() operation and getting the message below. Is there a way I can get around this? Do I need to install a different ...
-2
votes
2
answers
148
views
How do I parallelize a set of matrix multiplications
Consider the following operation, where I take 20 x 20 slices of a larger matrix and dot product them with another 20 x 20 matrix:
import numpy as np
a = np.random.rand(10, 20)
b = np.random.rand(20, ...
0
votes
3
answers
161
views
More efficient way of looping over a multidimensional numpy array other than numpy.where
I have a nested array of shape: [200, 500, 1000].
Each index represents a coordinate of an image, eg array[1, 2, 3] would give me the value of the array at x=1, y=2, and z=3 in coordinate space. I ...