0

I don't know the correct terminology, maybe it's called page file, but I'm not sure. I need a way to use an on-disk file as a buffer, like bytearray. It should be able to do things like a = buffer[100:200] and buffer[33] = 127 without the code having to be aware that it's reading from and writing to a file in the background.

Basically I need the opposite of bytesIO, which uses memory with a file interface. I need a way to use a file with a memory buffer interface. And ideally it doesn't write to the file everytime the data is changed (but it's ok if it does).

The reason I need this functionality is because I use packages that expect data to be in a buffer object, but I only have 4MB of memory available. It's impossible to load the files into memory. So I need an object that acts like a bytearray for example, but reads and writes data directly to a file, not memory.

In my use case I need a micropython module, but a standard python module might work as well. Are there any modules that would do what I need?

6
  • You might need to use some low level file.seeking for that Commented Nov 30, 2022 at 19:36
  • mmap ? Commented Nov 30, 2022 at 19:53
  • @jvx8ss "without the code having to be aware that it's reading from and writing to a file in the background" "The reason I need this functionality is because I use packages that expect data to be in a buffer object" Commented Nov 30, 2022 at 20:20
  • 1
    @0x0fba mmap does not exist for micropython, so it's not an option. Also it does not do what I need. It copies the full mapping into memory. It's not possible to map a 100MB file with mmap but only use 1MB of cache memory and it's an OS-specific functionality, not cpython. Commented Nov 30, 2022 at 20:25
  • @uzumaki Maybe if you make a class that internally uses file.seek and use __getitem__ and __setitem__ to achieve the buffer[100:200], buffer[33] = 127 that you want? Commented Nov 30, 2022 at 20:42

1 Answer 1

1

Can something like this work for you?

class Memfile:

    def __init__(self, file):
        self.file = file

    def __getitem__(self,key):
        if type(key) is int:
            self.file.seek(key)
            return self.file.read(1)
        if type(key) is slice:
            self.file.seek(key.start)
            return self.file.read(key.stop - key.start)

    def __setitem__(self, key, val):
        assert(type(val) == bytes or type(val) == bytearray)
        if type(key) is slice:
            assert(key.stop - key.start == len(val))
            self.file.seek(key.start)
            self.file.write(val)
        if type(key) is int:
            assert(len(val) == 1)
            self.file.seek(key)
            self.file.write(val)

    def close(self):
        self.file.close()


if __name__ == "__main__":
    mf = Memfile(open("data", "r+b")) # Assuming the file 'data' have 10+ bytes
    mf[0:10] = b'\x00'*10
    print(mf[0:10]) # b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
    mf[0:2] = b'\xff\xff'
    print(mf[0:10]) # b'\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00'
    print(mf[2]) # b'\x00'
    print(mf[1]) # b'\xff'
    mf[0:4] = b'\xde\xad\xbe\xef'
    print(mf[0:4]) # b'\xde\xad\xbe\xef'
    mf.close()

Note that if this solutions fits your needs you will need to do plenty of testing here

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, this would be the basic implementation of what I need. Is there a caching package that could be patched into the class? One that would read from the cache if possible and flush the cache to disk when it's full. Also shouldn't it be self.file.read(key.stop - key.start) ?
Your right, it should self.file.read(key.stop - key.start). I don't know about the caching tough, sorry

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.