0

I have a sparse 2D matrix saved on a disk (.npz extension) that I've created in preprocessing step with scipy.sparse.csr_matrix. It is a long sequence of piano-roll (a numerical form of MIDI representation) format 1-channel image. I cannot convert whole matrix to dense representation - it will not fit in my memory.

How do I create mini-batches with predefined sizes from the sparse matrix?

I've tried converting CSR representation to COO and creating batches of data from it.

sparse_matrix = sc.sparse.load_npz(file_name)
coo_matrix = sparse_matrix.tocoo()
for batch_index in range(num_batches):
    start_index = batch_index * num_samples
    end_index = (batch_index + 1) * num_samples

    start_index = batch_index * num_samples
    end_index = (batch_index + 1) * num_samples

    batch_data = coo_matrix.data[start_index:end_index]
    batch_row = coo_matrix.row[start_index:end_index]
    batch_col = coo_matrix.col[start_index:end_index]

    batch_sparse_matrix = scipy.sparse.coo_matrix(
        (batch_data, (batch_row, batch_col)),
        shape=(batch_size, image_width*image_height)
    )

but I got errors like: row index exceeds matrix dimensions which means I have too much data for the shape I defined. The row and col index is outside of shape boundaries.

I've tried something like this, to get the right amount of data, but it's very slow.


non_zero_indices = np.where((co_matrix.row >= start_index) & (co_matrix.row < end_index))[0]

start_index = non_zero_indices[0]
end_index = non_zero_indices[-1] + 1
2
  • 1
    Do you understand the indptr, indices and data attributes of a csr? indptr can be used index rows. But you can also use indexing M[10:100] returns a 90 row slice (copy) of M. row indexing is relatively efficent. docs.scipy.org/doc/scipy/reference/generated/… Commented May 16, 2023 at 2:33
  • Yep, that worked. CSR was more suitable for this problem. This is my solution rows_to_extract = np.arange(start_index, end_index); batch_sparse_matrix = self.sparse_matrix[rows_to_extract, :] Commented May 16, 2023 at 18:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.