Python 3 memory error for large loop with large sparse matrix

Question

I am working with a code that involves the creating of a large relatively sparse matrix and the solution of a least squares minimization problem using it. However, I have been getting memory errors when I run my code, despite the fact that it seems that the matrix should not be large enough to strain my system (27000 by 2100 roughly, in my test cases)

I have created a simplified code that has the same storage requirements as my test case and also generates a memory error (note that the "sparse" matrix is not actually very sparse as I am testing with a smaller scale problem than what the actual intended dataset will entail):

import numpy as np
from scipy import sparse

BM = sparse.lil_matrix((27000, 3000))
for i in range(0, 3000):
    local_mat = np.random.rand(30,30,30)
    local_mat[local_mat<0.1] = 0
    vals = local_mat.ravel()
    nonzero = vals.nonzero()
    BM[nonzero, i] = vals[nonzero]

If I change the parameters such that there are more zero entries in the sparse matrix, I will still get a memory error from scipy.sparse.linalg.lsq_linear after filling the rows of the matrix and performing a minimization problem with it

It goes without saying that I get a memory error if I use a dense matrix as well.

I have tried to increase the size of my paging file to 2-4 gigabytes but that hasn't helped, though it seems like this should not be that memory intensive regardless

When I loop 30 times, BM.nnz is 729012. The last iteration has set 24182 of possible 27000 elements.. This is not sparse at all! — hpaulj
– hpaulj, Commented Jun 2, 2020 at 4:15
Yes that's more or less a quirk of the specific test case i'm using, since the code is oriented around particle measurements in 3d space, but I've been using 30x30x30 densely packed windows before moving on to large 1000x1000x100 data sets. — George
– George, Commented Jun 2, 2020 at 4:46
I don't want to intentionally create a memory error or otherwise hang my computer. You'll have to sort out these issues yourself. Keep a watch on the matrix sparsity, and watch out for any inadvertent conversion to dense. If the memory error produces a traceback, examine that,and maybe even post it. — hpaulj
– hpaulj, Commented Jun 2, 2020 at 4:51
Alright. In general, do you know of any alternative solutions? Even calling an operation like nonzero() on the dense version of the matrix causes a memory error — George
– George, Commented Jun 2, 2020 at 5:00

CJR · Accepted Answer · 2020-06-02 14:58:46Z

1

You've created a memory bonfire. Let's recall how a lil_matrix works:

Each row of the matrix is kept as a python list. There is one list for data and another list for index. Each non-zero element has one entry on each list.

Taking your example 27000, 3000 matrix, let's add this up. Each entry in a python list is an object which burns 16 bytes for overhead. For the data list, the float data itself is another 8 bytes. So just the non-zero values eats up 2GB. Now let's look at the indices - another set of lists each entry of which burns 16 bytes for overhead. Poof - there's another 1.5 GB gone. Bit more memory for the lists and the array that holds them, but that's not much compared to what's already gone down.

Compared to a 27000, 3000 dense array, which eats up 650MB of memory, you've managed to use an extra 5x as much memory, while also acquiring several major disadvantages compared to a numpy array. A standard sparse format like CSR would be a much better choice - that would use just about 900MB of memory for your 90% dense example.

EDIT: Also you're trying to call a function which absolutely requires a CSR matrix, so that function is casting the horrible lil_matrix you've created into a csr_matrix, forcing you to keep two full copies of this data in-memory.

edited Jun 2, 2020 at 14:58

answered Jun 2, 2020 at 14:45

CJR

3,9872 gold badges13 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

hpaulj Over a year ago

I assume he's using lil because he's building the matrix iteratively, block by block. csr objects to setting nonzero values, The time honored method is to construct the coo style inputs iteratively, and create the matrix from that with one call. The bmat (and hstack and vstack) functions do that.

George Over a year ago

This was enlightening, thank you for the answers. Regarding the density of the matrix: that's more or less a quirk of the specific test case i'm using, since the code is oriented around particle measurements in 3d space, but I've been using 30x30x30 densely packed windows (with Np = 2100 particles) before moving on to large 1000x1000x100 data sets. \\ Consequently though, I have to construct this matrix as sparse from the ground-up, so I was wondering if there was a way to do so efficiently with csr matrices. Otherwise I may have to implement a "sliding window" type approach

CJR Over a year ago

The most efficient easy way to column-wise build a CSR matrix is probably to hstack a list of CSC or COO matrices which each represent a single column. The absolute most memory-efficient way is two-pass; run through the problem once to figure out how much memory you need to allocate, allocate that memory, and then run through the problem again to populate the allocated memory with values. You won't need any intermediate data structures that way.

George Over a year ago

Thank you. What is the best way to preallocate? I tried to simply hstack a list of Nx1 csc matrices but I run into problems when hstacking

George Over a year ago

Also if I increase the "size" of the array to 100000 by ... instead of 27000 by ... I will run into memory errors in the former case but not in the latter, but I am keeping the number of nonzero entries the same

|

Collectives™ on Stack Overflow

Python 3 memory error for large loop with large sparse matrix

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related