Splitting larger functions into more readable, smaller functions is part of writing Pythonic code -- it should be obvious what you're trying to accomplish and smaller functions are easier to read, check for errors, maintain, and reuse.
As always, "which has better performance" questions should always be solved by profiling the code, which is to say that it's often dependent on the signatures of the methods and what your code is doing.
e.g. if you're passing a large dictionary to a separate function instead of referencing a frame local, you'll end up with different performance characteristics than calling a void function from another.
For example, here's some trivial behavior:
import profile
import dis
def callee():
for x in range(10000):
x += x
print("let's have some tea now")
def caller():
callee()
profile.run('caller()')
let's have some tea now
26 function calls in 0.002 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.000 0.000 0.000 0.000 :0(decode)
2 0.000 0.000 0.000 0.000 :0(getpid)
2 0.000 0.000 0.000 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(range)
1 0.000 0.000 0.000 0.000 :0(setprofile)
2 0.000 0.000 0.000 0.000 :0(time)
2 0.000 0.000 0.000 0.000 :0(utf_8_decode)
2 0.000 0.000 0.000 0.000 :0(write)
1 0.002 0.002 0.002 0.002 <ipython-input-3-98c87a49b247>:4(callee)
1 0.000 0.000 0.002 0.002 <ipython-input-3-98c87a49b247>:9(caller)
1 0.000 0.000 0.002 0.002 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 iostream.py:196(write)
2 0.000 0.000 0.000 0.000 iostream.py:86(_is_master_process)
2 0.000 0.000 0.000 0.000 iostream.py:95(_check_mp_mode)
1 0.000 0.000 0.002 0.002 profile:0(caller())
0 0.000 0.000 profile:0(profiler)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
vs.
import profile
import dis
def all_in_one():
def passer():
pass
passer()
for x in range(10000):
x += x
print("let's have some tea now")
let's have some tea now
26 function calls in 0.002 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.000 0.000 0.000 0.000 :0(decode)
2 0.000 0.000 0.000 0.000 :0(getpid)
2 0.000 0.000 0.000 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(range)
1 0.000 0.000 0.000 0.000 :0(setprofile)
2 0.000 0.000 0.000 0.000 :0(time)
2 0.000 0.000 0.000 0.000 :0(utf_8_decode)
2 0.000 0.000 0.000 0.000 :0(write)
1 0.002 0.002 0.002 0.002 <ipython-input-3-98c87a49b247>:4(callee)
1 0.000 0.000 0.002 0.002 <ipython-input-3-98c87a49b247>:9(caller)
1 0.000 0.000 0.002 0.002 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 iostream.py:196(write)
2 0.000 0.000 0.000 0.000 iostream.py:86(_is_master_process)
2 0.000 0.000 0.000 0.000 iostream.py:95(_check_mp_mode)
1 0.000 0.000 0.002 0.002 profile:0(caller())
0 0.000 0.000 profile:0(profiler)
2 0.000 0.000 0.000 0.000 utf_8.py:15(decode)
The two use the same number of function calls and there's no performance difference, which backs up my claim that it really matters to test in specific circumstances.
You can see that I have an unused import for the disassembly module. This is another helpful module that will allow you to see what your code is doing (try dis.dis(my_function)). I'd post a profile of the testcode I generated, but it would only show you more details that are not relevant to solving the problem or learning about what's actually happening in your code.