Performance of inline Python function definitions

Question

A general question for someone that knows function definition internals better than I do.

In general, is there a performance trade off to doing something like this:

def my_function():
    def other_function():
        pass

    # do some stuff
    other_function()

Versus:

def other_function():
    pass

def my_function():
    # do some stuff
    other_function()

I've seen developers inline functions before to keep a small, single use function close to the code that actually uses it, but I always wondered if there were a memory (or compute) performance penalty for doing something like this.

Thoughts?

What does your profiling show? docs.python.org/3/library/profile.html — user559633
– user559633, Commented Jan 6, 2015 at 18:54
generally i use inline functions when i want a closure. i also use them when the inline function has no use outside of its enclosing function. — acushner
– acushner, Commented Jan 6, 2015 at 18:54
@tristan, I was less concerned with compute performance, more memory internals I guess? But both would be interesting to know about. — Mike Sukmanowsky
– Mike Sukmanowsky, Commented Jan 6, 2015 at 18:55
@MikeSukmanowsky I've given you a bit of an introduction in my answer, but you should really look into general profiling and the dis module to help your learning process along. I'd suggest using modules that pass arguments to each other while you learn -- object resolution and variable lookups (e.g. global vs function namespace) are where you'll see real differences. — user559633
– user559633, Commented Jan 6, 2015 at 19:17

mgilson · Accepted Answer · 2015-01-06 19:04:05Z

8

Using timeit on my mac seems to favor defining the function at the module level (slightly), and obviously the results can vary from one computer to the next ...:

>>> import timeit
>>> def fun1():
...   def foo():
...     pass
...   foo()
... 
>>> def bar():
...   pass
... 
>>> def fun2():
...   bar()
... 
>>> timeit.timeit('fun1()', 'from __main__ import fun1')
0.2706329822540283
>>> timeit.timeit('fun2()', 'from __main__ import fun2')
0.23086285591125488

Note that this difference is small (~10%) so it really won't make a major difference in your program's runtime unless this is in a really tight loop.

The most frequent reason to define a function inside another one is to pick up the out function's local variables in a closure. If you don't need a closure, then you should pick the variant that is easiest to read. (My preference is almost always to put the function at the module level).

edited Jan 6, 2015 at 19:04

answered Jan 6, 2015 at 18:58

mgilson

312k70 gold badges656 silver badges722 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sohail Si Over a year ago

Have you tried optimisation using Python -O ? This will be a more fair comparison, because the functions may not be considered inline unless in compiler-optimisation mode.

Sohail Si Over a year ago

I tested it using optimisation and the result is the same as you reported.

matanox Over a year ago

Where is it documented that -O adds any optimizations?

score 6 · Accepted Answer · 2015-01-06 19:23:13Z

6

Splitting larger functions into more readable, smaller functions is part of writing Pythonic code -- it should be obvious what you're trying to accomplish and smaller functions are easier to read, check for errors, maintain, and reuse.

As always, "which has better performance" questions should always be solved by profiling the code, which is to say that it's often dependent on the signatures of the methods and what your code is doing.

e.g. if you're passing a large dictionary to a separate function instead of referencing a frame local, you'll end up with different performance characteristics than calling a void function from another.

For example, here's some trivial behavior:

import profile
import dis

def callee():
    for x in range(10000):
        x += x
    print("let's have some tea now")

def caller():
    callee()


profile.run('caller()')

let's have some tea now
         26 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 :0(decode)
        2    0.000    0.000    0.000    0.000 :0(getpid)
        2    0.000    0.000    0.000    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        2    0.000    0.000    0.000    0.000 :0(time)
        2    0.000    0.000    0.000    0.000 :0(utf_8_decode)
        2    0.000    0.000    0.000    0.000 :0(write)
        1    0.002    0.002    0.002    0.002 <ipython-input-3-98c87a49b247>:4(callee)
        1    0.000    0.000    0.002    0.002 <ipython-input-3-98c87a49b247>:9(caller)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 iostream.py:196(write)
        2    0.000    0.000    0.000    0.000 iostream.py:86(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:95(_check_mp_mode)
        1    0.000    0.000    0.002    0.002 profile:0(caller())
        0    0.000             0.000          profile:0(profiler)
        2    0.000    0.000    0.000    0.000 utf_8.py:15(decode)

vs.

import profile
import dis

def all_in_one():
    def passer():
        pass
    passer()
    for x in range(10000):
        x += x
    print("let's have some tea now")

let's have some tea now
         26 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 :0(decode)
        2    0.000    0.000    0.000    0.000 :0(getpid)
        2    0.000    0.000    0.000    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        2    0.000    0.000    0.000    0.000 :0(time)
        2    0.000    0.000    0.000    0.000 :0(utf_8_decode)
        2    0.000    0.000    0.000    0.000 :0(write)
        1    0.002    0.002    0.002    0.002 <ipython-input-3-98c87a49b247>:4(callee)
        1    0.000    0.000    0.002    0.002 <ipython-input-3-98c87a49b247>:9(caller)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 iostream.py:196(write)
        2    0.000    0.000    0.000    0.000 iostream.py:86(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:95(_check_mp_mode)
        1    0.000    0.000    0.002    0.002 profile:0(caller())
        0    0.000             0.000          profile:0(profiler)
        2    0.000    0.000    0.000    0.000 utf_8.py:15(decode)

The two use the same number of function calls and there's no performance difference, which backs up my claim that it really matters to test in specific circumstances.

You can see that I have an unused import for the disassembly module. This is another helpful module that will allow you to see what your code is doing (try dis.dis(my_function)). I'd post a profile of the testcode I generated, but it would only show you more details that are not relevant to solving the problem or learning about what's actually happening in your code.

edited Jan 6, 2015 at 19:23

answered Jan 6, 2015 at 19:02

user559633

5 Comments

Veedrac Over a year ago

I agree with your timings, but it's strange you used profile.run(f) instead of, say, min(timeit.Timer(f).repeat(10, 200)). profile is good at telling you approximately what the costly parts of a program are but it isn't so good at telling you how fast the program is due to the overhead.

user559633 Over a year ago

@Veedrac In the trivial example given, it's more important to see what's going on than it is to know how fast a non-useful function runs a bunch of times. Which is to say, knowing the difference in runtime between these two functions isn't important. Knowing how to go about figuring out why one method is faster for more realistic code is important.

Veedrac Over a year ago

I'm not sure I follow; what am I meant to learn from the trace?

user559633 Over a year ago

@Veedrac again, given such a trivial example, there's not much to glean besides the facts that the function calls aren't doing anything differently. The traces are there to illustrate the importance of testing in real circumstances. I could say "there's no difference" and "it's a matter of composition and reusability," but that's words until I show proof. I left a comment on the question and hopefully the asker found utility in my response and/or asks more in the future.

iperov Over a year ago

worst upvoted reply. Should be deleted.

Collectives™ on Stack Overflow

Performance of inline Python function definitions

2 Answers 2

3 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related