1

Is it better to read an entire file before performing operations or is it better to perform operations while reading from the file?

If I was reading the entire file first, I'd store the information line-by-line in a list and if I was reading the file and operating on the data at the same time, I would be reading line-by-line and executing my operation after a line is read.

For the sake of the discussion, let's say the file isn't obscenely large. It would be nice to hear thoughts on small files and large files and if actions would differ. Also, I presume the operations also play a role; I'm reading URL's and downloading files.

6
  • what are you doing with the files after youve read them? Commented Jul 11, 2014 at 19:01
  • 1
    As a stylistic choice, I like to read a file all at once, like with open(filename) as file: data=file.read().split("\n"). Just because I don't like putting large pieces of code in with blocks, nor do I want to remember to close files long after I've opened them. But that's just, like, my opinion, man. Commented Jul 11, 2014 at 19:06
  • Nothing. It's read-only for my purposes. Commented Jul 11, 2014 at 19:06
  • 1
    For the case of fetching URLs from an URL list, it simply doesn't matter. The time spent on HTTP requests will outweigh any file operations by several magnitudes. So, use the more memory efficient and readable approach: for line in f Commented Jul 11, 2014 at 19:08
  • putting elements into a list using readlines() then iterating over seems just slightly slower than just iterating over the file object in a file that has 100 lines. Commented Jul 11, 2014 at 19:08

1 Answer 1

1

Why don't you find out yourself for example using the timeit module

import timeit

WORDS = "/usr/share/dict/words"

def a():
    num_lines = 0
    num_chars = 0
    with open(WORDS) as f:
        lines = f.readlines()
        num_lines = len(lines)
        for line in lines:
            num_chars += len(line)
    return num_lines, num_chars


def b():
    num_lines = 0
    num_chars = 0
    with open(WORDS) as f:
        for line in f:
            num_chars += len(line)
            num_lines += 1
    return num_lines, num_chars

if __name__ == '__main__':
    print timeit.timeit("a()", setup="from __main__ import a", number=100)
    print timeit.timeit("b()", setup="from __main__ import b", number=100)
Sign up to request clarification or add additional context in comments.

1 Comment

I can certainly construct such a script and perhaps solve my single specific case but my intention was to have a discussion with people who are more knowledgeable than me about various scenarios.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.