3

As of now I use the following python code:

file = open(filePath, "r")
lines=file.readlines()
file.close()

Say my file has several lines (10,000 or more), then my program becomes slow if I do this for more than one file. Is there a way to speed this up in Python? Reading various links I understand that readlines stores the lines of file in memory thats why the code gets slow.

I have tried the following code as well and the time gain I got is 17%.

lines=[line for line in open(filePath,"r")]

Is there any other module in python2.4 (which I might have missed). Thanks, Sandhya

4
  • Which links? I would be interested to see proof that this is the case. Commented Feb 4, 2011 at 6:46
  • @Mikel: from the docstring: "readlines([size]) -> list of strings, each a line from the file. Call readline() repeatedly and return a list of the lines so read. The optional size argument, if given, is an approximate bound on the total number of bytes in the lines returned." Commented Feb 4, 2011 at 7:54
  • @DSM: I mean the docs that say readlines is slower. ;-) Commented Feb 4, 2011 at 8:05
  • @Mike1: ah, that make a lot more sense as a question. :^) Commented Feb 4, 2011 at 13:45

1 Answer 1

6
for line in file:

This gives you an iterator that reads the file object one line at a time and then discards the previous line from memory.

A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit. In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer. New in version 2.3.

Short answer: don't assign the lines to a variable, just perform whatever operations you need inside the loop.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.