2
import time
import logging
from functools import reduce

logging.basicConfig(filename='debug.log', level=logging.DEBUG)



def read_large_file(file_object):
    """Uses a generator to read a large file lazily"""

    while True:
        data = file_object.readline()
        if not data:
            break
        yield data


def process_file_1(file_path):
    """Opens a large file and reads it in"""

    try:
        with open(file_path) as fp:
            for line in read_large_file(fp):
                logging.debug(line)
                pass

    except(IOError, OSError):
        print('Error Opening or Processing file')


    def process_file_2(file_path):
        """Opens a large file and reads it in"""

        try:
            with open(path) as file_handler:
                while True:
                    logging.debug(next(file_handler))
        except (IOError, OSError):
            print("Error opening / processing file")
        except StopIteration:
            pass


    if __name__ == "__main__":
        path = "TB_data_dictionary_2016-04-15.csv"

        l1 = []
        for i in range(1,10):
            start = time.clock()
            process_file_1(path)
            end = time.clock()
            diff = (end - start)
            l1.append(diff)

        avg = reduce(lambda x, y: x + y, l1) / len(l1)
        print('processing time (with generators) {}'.format(avg))


        l2 = []
        for i in range(1,10):
            start = time.clock()
            process_file_2(path)
            end = time.clock()
            diff = (end - start)
            l2.append(diff)

        avg = reduce(lambda x, y: x + y, l2) / len(l2)
        print('processing time (with iterators) {}'.format(avg))

Output of the program:

C:\Python34\python.exe C:/pypen/data_structures/generators/generators1.py
processing time (with generators) 0.028033358176432314
processing time (with iterators) 0.02699498330810426

In the above program I was attempting to measure the time taken to open a read a large file with iterators with that using generators. The file is available here. The time for reading the file with iterators is much lower than the same with generators.

I am assuming that If I were to measure the amount of memroy used by the functions process_file_1 and process_file_2 then generators will outperform iterators. Is there a way to measure memory usage per function in python.

2
  • 1
    Do a read that you simply discard before the 2 tests to make sure any caching of the file by the operating system applies to both runs. Commented Nov 27, 2016 at 22:05
  • @tdelaney - I have updated the program slightly Commented Nov 27, 2016 at 22:30

1 Answer 1

6

Firstly, using single iteration of the code for measuring it's performance is not a good idea. Your results might vary due to any glitch in your system performance (for example: background process, cpu doing garbage collection, etc). You should be checking it for multiple iterations of the same code.

For measuring the performance of the code, use timeit module:

This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times.

For checking the memory consumption of your code, use Memory Profiler:

This is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for python programs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.