0

I got a MemoryError when processing a .xml file = 1,45 Gb. I tried to run it on a smaller file and it works, so there shouldn't be any bugs in the code. The code, itself, implies to open a xml file, do some stuff inside and save it back to a new txt file. I run Win7 x86, 2 Gb RAM, Python 2.6

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    openfile('ukwiki-latest-pages-articles.xml')
  File "C:\Users\Vof Freeman\Desktop\Python\test.py", line 7, in openfile
    contents = F.read()
  File "C:\Python26\lib\codecs.py", line 666, in read
    return self.reader.read(size)
  File "C:\Python26\lib\codecs.py", line 466, in read
    newdata = self.stream.read()
MemoryError

2 Answers 2

8

Since building an in-memory tree is not desirable (and in your case not practical either, given the amount of physical memory you have), there are two techniques you can use with lxml:

  • Supplying a target parser class
  • Using the iterparse method

Refer to the documentation here to see how this can be done.

Sign up to request clarification or add additional context in comments.

Comments

0

Simply put, you don't have enough RAM to read this file. You should split it up into smaller XML files and read it that way.

The fact that it worked on a smaller file tells me that there's nothing wrong with your code, it's just your hardware that can't handle it.

2 Comments

if I split it, how do I get a single txt file then on the output?
Keep the same file object open while you're reading each XML file, and continue writing to it throughout the program rather than closing the file and opening up a new one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.