I have a "not so" large file (~2.2GB) which I am trying to read and process...
graph = defaultdict(dict)
error = open("error.txt","w")
print "Reading file"
with open("final_edge_list.txt","r") as f:
for line in f:
try:
line = line.rstrip(os.linesep)
tokens = line.split("\t")
if len(tokens)==3:
src = long(tokens[0])
destination = long(tokens[1])
weight = float(tokens[2])
#tup1 = (destination,weight)
#tup2 = (src,weight)
graph[src][destination] = weight
graph[destination][src] = weight
else:
print "error ", line
error.write(line+"\n")
except Exception, e:
string = str(Exception) + " " + str(e) +"==> "+ line +"\n"
error.write(string)
continue
Am i doing something wrong??
Its been like an hour.. since the code is reading the file.. (its still reading..)
And tracking memory usage is already 20GB.. why is it taking so time and memory??
readlines()will make the memory issue worse: it will read the entire file into memory before the loop starts, wherefor line in f:will put just single lines into memory.for line in f:won't work properly without usingreadlines()first. Anyway, thanks and never mind.