0

I have a numpy array which saved as an uncompressed '*npz' file is about 26 GiB as it is numpy.float32 and numpy.savez() ends with:

OSError: Failed to write to /tmp/tmpl9v3xsmf-numpy.npy: 6998400000 requested and 3456146404 written

I suppose saving it compressed may save the day, but with numpy.savez_compressed() I have also:

OSError: Failed to write to /tmp/tmp591cum2r-numpy.npy: 6998400000 requested and 3456157668 written

as numpy.savez_compressed() saves the array uncompressed first.

The obvious "use additional storage" I do not consider an answer. ;)

[EDIT]

The tag low-memory refers to disk memory, not RAM.

16
  • What kind of data you're preserving in that array? Commented Feb 28, 2018 at 12:13
  • @Kasramvd floats, specifically numpy.float32 Commented Feb 28, 2018 at 12:18
  • 1
    Can't you use a lighter format like float16, int8, uint8, etc.? Commented Feb 28, 2018 at 12:24
  • 3
    If you have such a big array and need that precision, that is how much it takes to store it. The only way you could really reduce it (besides generic compression) is if there are known patterns in the data, e.g. is it a sparse array, or are there repeated or derived values? If all the values have about the same exponent maybe storing only the mantissa in int16/uint16 could be enough? Also, do you know what is your file system? It may limit the size of the files that you can store. Commented Feb 28, 2018 at 12:47
  • 1
    @CharlesDuffy well, I suppose then you have to save it to a BytesIO object first, then compress that. Which, due to memroy demand, is probably no solution either... Commented Feb 28, 2018 at 13:36

2 Answers 2

1

With the addition of ZipFile.open(..., mode='w') in Python 3.6, you can do better:

import numpy as np
import zipfile
import io

def saveCompressed(fh, **namedict):
     with zipfile.ZipFile(fh, mode="w", compression=zipfile.ZIP_DEFLATED,
                          allowZip64=True) as zf:
         for k, v in namedict.items():
             with zf.open(k + '.npy', 'w', force_zip64=True) as buf:
                 np.lib.npyio.format.write_array(buf,
                                                 np.asanyarray(v),
                                                 allow_pickle=False)
Sign up to request clarification or add additional context in comments.

4 Comments

Looks almost exactly as implementation I am testing right now, with the exception of with zf.open(k + '.npy', mode='w', force_zip64=True) as buf:
Using zf.open() is the key difference, since it allows the file created inside the zip to be written incrementally (thus, with a sane ZipFile implementation, with bounded memory usage).
I mean the , force_zip64=True part.
Your code ends with ValueError: Can't close the ZIP file while there is an open writing handle on it. Close the writing handle before closing the zip.. I guess it is about the size of the array. Would you mind to include the frorce_zip64=True part in your answer?
1

Note: I would be more than happy to accept a more RAM-efficient solution.

I have browsed the numpy.savez_compressed() code and decided to reimplement part of its functionality:

import numpy as np
import zipfile
import io

def saveCompressed(fh, **namedict):
     with zipfile.ZipFile(fh,
                          mode="w",
                          compression=zipfile.ZIP_DEFLATED,
                          allowZip64=True) as zf:
         for k, v in namedict.items():
             buf = io.BytesIO()
             np.lib.npyio.format.write_array(buf,
                                             np.asanyarray(v),
                                             allow_pickle=False)
             zf.writestr(k + '.npy',
                         buf.getvalue())

It causes my system to swap, but at least I am able to store my data (sham data used in the example):

>>> A = np.ones(12 * 6 * 6 * 1 * 6 * 6 * 10000* 5* 9, dtype=np.float32)
>>> saveCompressed(open('test.npz', 'wb'), A=A)
>>> A = np.load('test.npz')['A']
>>> A.shape
(6998400000,)
>>> (A == 1).all()
True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.