5

I have a 5gb text file and i am trying to read it line by line. My file is in format-: Reviewerid<\t>pid<\t>date<\t>title<\t>body<\n> This is my code

o = open('mproducts.txt','w')
with open('reviewsNew.txt','rb') as f1:
    for line in f1:
        line = line.strip()
        line2 = line.split('\t')
        o.write(str(line))
        o.write("\n")

But i get Memory error when i try to run it. I have an 8gb ram and 1Tb space then why am i getting this error? I tried to read it in blocks but then also i get that error.

MemoryError 
17
  • 3
    How long is the longest line in that file? Commented Oct 17, 2016 at 21:56
  • @FranciscoCouzo I dont know. But when i try to open that file in EmEditor then a pop up window comes that "it contains some very large lines. Do you want to open it in binary format." By choosing binary option it displays the file correctly. Commented Oct 17, 2016 at 21:59
  • 1
    What is o in o.write()? If you are keeping everything that you read in memory, I am not surprised that you are getting a memory error. Commented Oct 17, 2016 at 22:00
  • Mode 'rb' opens the file in binary mode. Try 'r+'. See docs.python.org/2/tutorial/inputoutput.html Commented Oct 17, 2016 at 22:00
  • 1
    Use for i, line in enumerate(f1): and print i on each iteration. The last one you see printed should be the last good line. Commented Oct 17, 2016 at 22:52

1 Answer 1

4

Update:

Installing 64 bit Python solves the issue.

OP was using 32 bit Python that's why getting into memory limitation.


Reading whole comments I think this can help you.

  • You can't read file in chunk (as 1024) since you want to process data.
  • Instead, read file in chunk of lines i.e N lines at a time.
  • You can use yield keyword and itertools in Python to achieve above.

Summary : Get N lines at time, process it and then write it.

Sample Code :

from itertools import islice
#You can change num_of_lines
def get_lines(file_handle,num_of_lines = 10):
    while True:
        next_n_lines = list(islice(file_handle, num_of_lines))
        if not next_n_lines:
            break
        yield next_n_lines


o = open('mproducts.txt','w')

with open('reviewsNew.txt','r') as f1:
    for data_lines in get_lines(f1):
        for line in data_lines:
            line = line.strip()
            line2 = line.split('\t')
            o.write(str(line))
            o.write("\n")
o.close()
Sign up to request clarification or add additional context in comments.

24 Comments

You were so right. I installed 64 bit and it worked. Thank you so much ;)
I would upvote the answer but i am not allowed to do that as my points are less than 15 :P. But thank you so much :)
No issues. The good point is it works and you learned memory limitation of 32 bit applications :) :)
Yes since it was 32 bit so it could use only 4GB.But my code was reading line by line so it didn't require that much memory. So why was i getting that error?
The reason is that for ... in somefile: copies all file contents to memory
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.