10

I've recently been learning Python multiprocessing, and have run into a roadblock. I have a lerge sparse SciPy array (CSC-format), that I need to share in read only format between 5 worker-processes. I've read this and this (numpy-shared), but this seems to be only for dense-types.

How would I share a scipy.sparse.csc_matrix() without copying (or with minimal copying) between 5 multiprocessing Process objects? Even the numpy-shared method seems to require copying the entire array, and even then, I can't just convert a scipy.sparse into a mp.Array(). Could anyone help point me in the right direction?

Thanks!

1 Answer 1

5

I cannot help you with the multiprocessing part of your question, but a CSC sparse matrix is little more than three numpy arrays. You can instantiate another sparse matrix, b, sharing the same memory objects as a sparse matrix, a, by doing:

import scipy.sparse as sps

b = sps.csc_matrix((a.data, a.indices, a.indptr), shape=a.shape, copy=False)

a.data, a.indices and a.indptr are the three numpy arrays you want to share between your processes, if you can do that, then instantiating a sparse matrix in each process will be an inexpensive operation.

Sign up to request clarification or add additional context in comments.

2 Comments

Ah, cool. It won't make copies during the csc_matrix() construction?
Unless you specify copy=True it shouldn't.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.