3

I pass large scipy.sparse arrays to a parallel processes on shared memory of one computing node. In each round of parallel jobs, the passed array will not be modified. I want to pass the array with zero-copy.

While this is possible with multiprocessing.RawArray() and numpy.sharedmem (see here), I am wondering how ray's put() works.

As far as I understood (see memory management, [1], [2]), ray's put() copies the object once and for all (serialize, then de-serialize) to the object store that is available for all processes.

Question:

I am not sure I understood it correctly, is it a deep copy of the entire array in the object store or just a reference to it? Is there a way to "not" copy the object at all? Rather, just pass the address/reference of the existing scipy array? Basically, a true shallow copy without the overhead of copying the entire array.


Ubuntu 16.04, Python 3.7.6, Ray 0.8.5.

3
  • This can be also helpful to understand how zero-copy read works docs.ray.io/en/master/… Commented May 22, 2020 at 22:33
  • scipy.sparse matrix is not a ndarray subclass. It's a custom Python class, or rather classes. Different formats have different classes and data storage attributes. One is actually a dictionary subclass, the others store the data (and indices) in several ndarrays. Commented May 23, 2020 at 3:33
  • @hpaulj that's no problem I can pass components of scipy sparse array like non-zero data and their indices as ndarray. Commented May 23, 2020 at 4:15

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.