I am trying to count number of words that has length between 1 and 5, file size is around 4GB end I am getting memory error.
import os
files = os.listdir('C:/Users/rram/Desktop/')
for file_name in files:
file_path = "C:/Users/rram/Desktop/"+file_name
f = open (file_path, 'r')
text = f.readlines()
update_text = ''
wordcount = {}
for line in text:
arr = line.split("|")
word = arr[13]
if 1<=len(word)<6:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
update_text+= '|'.join(arr)
print (wordcount) #print update_text
print 'closing', file_path, '\t', 'total files' , '\n\n'
f.close()
At the end i get a MemoryError on this line text = f.readlines()
Can you pelase help to optimize it.
text = f.readlines()you can iterate over the file handlefor line in f:. Don't overload your memory reading all of the file at once.