0

There have been questions asked about memory errors in Python, but I want to ask one more specific to my situation. I am new to programming and Python.

When parsing a large text file (~8GB), the line

mylist = [line.strip('\n').split('|') for line in f]

resulted in "MemoryError".

I am running the 64-bit of Python [MSC v.1500 64 bit (AMD64)] on Windows XP 64-bit with 12GB of RAM. How can I handle this Memory Error other than installing more RAM?

4 Answers 4

5

The memory error is coming because you're trying to store your whole file in a list(which is in memory). So, try to work on each line instead of storing it:

for line in f:
   data = line.strip('\n').split('|')
   #do something here with data
Sign up to request clarification or add additional context in comments.

Comments

3

It depends what you want to do with your list.

If you want to work on a line-by-line basis, you can probably get the job done using an list generator instead of a list comprehension, which will look like this:

myiterator = (line.strip('\n').split('|') for line in f)

(not that I changed [...] by (...)). This will return an iterator instead of a list, and since for line in f also doesn't create a list, you are going to load one line at a time.

If you want to work on all lines at once, you will probably have to combine this with another technique not to use all your memory.

2 Comments

@user1839897 Note that creating a list generator does not actually execute the code inside the parentheses. That won't happen until you loop over the generator, for instance using a for loop: for line_parts in myiterator:.
Exactly, this is also why it's quite cheap to create such generator, since the computation only occurs when you actually need it. Also, it allows you to carry around the computation as part of the generator value, so it's easy then to combine other computation together, as part of a single value (the generator). It can still be seen as a list, although there are some differences (no len(), no slicing, no indexing, etc.)
1

You should definitely use a lazy generator to parse such a huge file one line at a time, or divide the file in smaller chunks.

One possibility:

def lazy_reader(path):
    """reads a file one line at a time."""
    try:
        file = open(path, 'r')
        while True:
            line = file.readline()
            if not line: break
            yield line             # "outputs" the line from the generator
    except IOError:
        sys.stderr.write("error while opening file at %s\n" % path)
        sys.exit(2)
    finally:
        file.close()

and then you can consume your generator like this

for line in lazy_reader("path/to/your/file"):
    do_something_with(line)

EDIT: you can also combine generators in a neat "pipelined" way:

def parse(generator):
    for line in generator: yield line.strip('\n').split('|')

for data in parse( lazy_reader("path/to/your/file") ):
    do_something_with_splitted_array(data)

5 Comments

@l4mpi yes, but reading it all into a list with [line for line in f] isn't
The list comprehension in the question could be replaced by a generator expression, just by swapping [] for ()
@katrielalex of course not, but why write this lazy_reader function then if one could just use the built-in file iterator which does the same?
The answer from Ashwini Chaudhary does the same in 2 lines instead of ... dozens? Mine is also much more concise and as valid. THIS is not very pythonic (at all).
I did not new about the () idiom actually. Thx for sharing. I adapted this answer from some code I had where I had to do some parsing with numeric binary data where it makes sense to have this function.
1

My take on it which using with to make errors easier, a generator to define what lines should look like, then iterates over that:

with open('somefile') as fin:
    lines = (line.strip('\n').split('|') for line in fin)
    for line in lines:
        pass # do something with line

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.