3

I saved a scipy csr matrix using np.save('X', X). When I load it with np.load('X.npy'), I get this signiture:

array(<240760x110493 sparse matrix of type '<class 'numpy.float64'>' with 20618831 stored elements in Compressed Sparse Row format>, dtype=object)

However, I cannot access this data using indexes (such as X[0,0] or X[:10,:10] or X[0] all give error IndexError: too many indices for array) and calling .shape returns ().

Is there a way to access this data, or is it corrupt now?

Edit.

Since there are 3 options to save/load a matrix I ran a speed comparison to see which works the best for my sparse matrix:

Writing a sparse matrix:

%timeit -n1 scipy.io.savemat('tt', {'t': X})
1 loops, best of 3: 66.3 ms per loop

timeit -n1 scipy.io.mmwrite('tt_mm', X)
1 loops, best of 3: 7.55 s per loop

timeit -n1 np.save('tt_np', X)
1 loops, best of 3: 188 ms per loop

Reading a sparse matrix:

timeit -n1 scipy.io.loadmat('tt')
1 loops, best of 3: 9.78 ms per loop

%timeit -n1 scipy.io.mmread('tt_mm')
1 loops, best of 3: 5.72 s per loop

%timeit -n1 np.load('tt_np.npy')
1 loops, best of 3: 150 ms per loop

The results are that mmread/mmwrite are incredibly low (~100s times slower), and savemat/loadmat is 3-10 times faster than save/load.

1 Answer 1

5

Let's pay attention to all the clues in the print

array(<240760x110493 sparse matrix of type '<class 'numpy.float64'>'
     with 20618831 stored elements in Compressed Sparse Row format>, dtype=object)

Outermost:

array(....,dtype=object)

A sparse matrix is not a regular array; to np.save, it is just an Python object. So it wrapped it in a dtype=object and saved that. It is a 0d array (hence the () shape), so all the indexing attempts fail. Try instead

M=arr.item() # or
M=arr[()]

Now M should display as:

sparse matrix of type '<class 'numpy.float64'>'
     with 20618831 stored elements in Compressed Sparse Row format

with attributes like M.shape. M.A will display the dense form, to it's too large to do that usefully.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! that worked. I never thought of a a python object as a 0-d array.
Yes, 0d arrays are a bit of stretch for all of us. Good thing they didn't try to implement negative dimensions. :)
scipy.io.savemat/loadmat know how to save and load sparse matrices - in a MATLAB compatible format.
I tried scipy.io.mmread/mmload and it was really slow compared to np.read/write. Is scipy.io.savemat/loadmat faster than scipy.io.mmread/mmwrite?
I've never used the mmread ones; and haven't timed the savemat. I can imagine saving a csr matrix with savez, saving 3 arrays, (the data, indices, indptr attributes) plus shape information (dtype is contained in data). That may be the fastest and most compact method, but I don't know if anyone has implemented it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.