2

I am using Python and Scipy library to create a sparse matrix, specifically csr_matrix (Compressed Sparse Row Matrix). The matrix is rather big, about 70000*70000 elements. I build the matrix as a 2d array and then construct the csr_matrix, giving the 2d array as an argument. Constructing a very sparse matrix of the size is easily done without any issues.

The problem rises when giving a denser 2d array (much less zero elements), the process is interrupted with an error:

Value Error: unrecognized csr_matrix constructor usage

I tried building a dense matrix in the interactive Python environment with the same size and got exactly the same error.

from scipy import sparse
a = [[10 for i in range(70000)] for j in range(70000)]
mat = sparse.csr_matrix(a)

So my question is:

-Does constructing the csr_matrix depend on how sparse the 2d array is? What is the limit?

-How can I be sure the program wouldn't be interrupted in the middle of processing with such errors?

-Any alternative solutions?

Thanks in advance

2 Answers 2

1

With smaller numbers your method works:

In [20]: a=[[10 for i in range(1000)] for j in range(1000)]
In [21]: M=sparse.csr_matrix(a)
In [22]: M
Out[22]: 
<1000x1000 sparse matrix of type '<class 'numpy.int32'>'
    with 1000000 stored elements in Compressed Sparse Row format>

Density is not the issue. Size probably is. I can't reproduce your error because when I try larger sizes my machine slows to a crawl and I have to interrupt the process.

As given in the documentation, csr_matrix takes several kinds of input. It recognizes them based on the number of elements. I'd have to look at the code to remember the exact logic. But one method expects a tuple of 3 arrays or lists, another a tupe of 2 items, with the second being another tuple. The third is a numpy array. Your case, a list of lists doesn't fit any of those, but it probably trys to turn it into an array.

a = np.array([[10 for i in range(M)] for j in range(N)])

Most likely your error message is the result of some sort memory error - you are trying to make too large of a matrix. A dense matrix 70000 square is big (at least on some machines) and a sparse one representing the same matrix will be even larger. It has to store each of the elements 3 times - once for value, and twice for coordinates.

A truely sparse matrix of that size works because the sparse representation is much smaller, roughly proportional to 3x the number of nonzero elements.


In scipy/sparse/compressed.py

class _cs_matrix(...):
    """base matrix class for compressed row and column oriented matrices"""
    def __init__(self, arg1, ...):
        <is arg1 a sparse matrix>
        <is arg1 a tuple>
       else:
            # must be dense
            try:
                arg1 = np.asarray(arg1)
            except:
                raise ValueError("unrecognized %s_matrix constructor usage" % self.format)

My guess it that it tries:

np.asarray([[10 for i in range(70000)] for j in range(70000)])

and that results in some sort of error, most likely 'too large' or 'memory'. That error is caught, and reissued with this 'unrecognized ..' message.

Try

A = np.array(a)
M = sparse.csr_matrix(A)

I suspect it will give you a more informative error message.

Sign up to request clarification or add additional context in comments.

1 Comment

What you said is convincing. Although building A as a Numpy array will consume much more memory and I couldn't test your suggested code afterwards (to get the exact message), the previous error is more clear to me now. I will try to solve the memory problem. Thanks!
0

Check out the last two examples on creating sparse matrices:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.html

You probably can find the answers to your other questions in the documentation as well

4 Comments

The last two examples are using another type of instantiation, csr_matrix((data, indices, indptr), [shape=(M, N)]). Well, I don't have those three arrays (data, indices and indptr); in fact, I just want them. Besides, I don't want to see a representation of a dense matrix, I already have it. My question is, whether the number of not zero elements causes a problem in constructing a matrix or not.
@SaraJavadzadeh, we are talking about sparse matrices, so if you have a a matrix with very few zeros in it, than it is not sparse, despite representing in a sparse matrix data structure...
That's right. But still two questions remains. Firstly, the error message is strange, it doesn't include any information on density. Secondly, I want to know how much sparsity is enough for csr_matrix.
@SaraJavadzadeh, have you tried any one of the other sparse matrices in scipy? I think the error you get is because your matrix is not in fact sparse

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.