76

I just realized that doing

x.real*x.real+x.imag*x.imag

is three times faster than doing

abs(x)**2

where x is a numpy array of complex numbers. For code readability, I could define a function like

def abs2(x):
    return x.real*x.real+x.imag*x.imag

which is still far faster than abs(x)**2, but it is at the cost of a function call. Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

8
  • 14
    If you need this kind of optimisations, you probably need to use something like Cython. Commented Jun 22, 2011 at 15:05
  • 8
    PyPy to the rescue! Commented Jun 22, 2011 at 15:07
  • 12
    If you care about such small optimisations, you should be using C, not python. python is not about speed, really. Commented Jun 22, 2011 at 15:07
  • 2
    Have you tried timing the statement vs. function call to see if there's really a difference? Commented Jun 22, 2011 at 15:18
  • 3
    In addition the the very correct and important (seriously, listen to them), note that due to the dynamic nature of Python, the only time inlining could possible happen is at runtime. This is one of the many optimizations PyPy does (although it doesn't have a remotely complete NumPy yet; but at least it's being worked on), and PyPy works best on idiomatic Python code, not on code written to shave off tiny bits of time off execution overhead. Commented Jun 22, 2011 at 15:26

7 Answers 7

46

Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

No. Before reaching this specific instruction, Python interpreters don't even know if there's such a function, much less what it does.

As noted in comments, PyPy will inline automatically (the above still holds - it "simply" generates an optimized version at runtime, benefits from it, but breaks out of it when it's invalidated), although in this specific case that doesn't help as implementing NumPy on PyPy started only shortly ago and isn't even beta level to this day. But the bottom line is: Don't worry about optimizations on this level in Python. Either the implementations optimize it themselves or they don't, it's not your responsibility.

Sign up to request clarification or add additional context in comments.

5 Comments

@phant0m Not sure why you guys like that quote so much... It basically says you cannot optimize without making code ugly. I just had to inline several calls to make my program twice as fast. At least it was worth it...
I also find it a bit hard to accept that last comment. It's nice and all that it's "not my responsibility", but at the end of the day, I can't tell my boss that it's somebody else's fault if my code misses performance targets.
If PyPy inlines automatically, does that allow it to do optimizations such as omitting to return values that will not be used by the caller, so that those variables can be destructed earlier to free up memory or perhaps not be computed at all if not needed? For some applications where the variables take up a lot of space such optimizations can be critical.
Does it really matter who's responsibility you think it is to optimize the code? What you're expressing is your opinion. The fact is that whether you choose to optimize or not comes with real consequences, both those that are beneficial and those that are not so beneficial.
I can't believe this is the most up-voted answer... 😭 There's plenty of scope for AST transformations in Python, including something very close to what the OP asked for. The next answer is a lot better, and I'm surprised that kind of inlining isn't used more often.
39

Not exactly what the OP has asked for, but close:

Inliner inlines Python function calls. Proof of concept for this blog post

from inliner import inline

@inline
def add_stuff(x, y):
    return x + y

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(add_stuff(i, i+1))

In the above code the add_lots_of_numbers function is converted into this:

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(i + i + 1)

Also anyone interested in this question and the complications involved in implementing such optimizer in CPython, might also want to have a look at:

2 Comments

Sorry what is the difference between your solution and the question?
@RogerS, the OP had asked about something similar to C macros (inline keyword) which are very flexible and efficient. This library has some limitations and has a startup time cost, but other than those, it does what the question asks.
10

I'll agree with everyone else that such optimizations will just cause you pain on CPython, that if you care about performance you should consider PyPy (though our NumPy may be too incomplete to be useful). However I'll disagree and say you can care about such optimizations on PyPy, not this one specifically as has been said PyPy does that automatically, but if you know PyPy well you really can tune your code to make PyPy emit the assembly you want, not that you need to almost ever.

Comments

9

No.

The closest you can get to C macros is a script (awk or other) that you may include in a makefile, and which substitutes a certain pattern like abs(x)**2 in your python scripts with the long form.

3 Comments

... which is a horrible idea, a lot of extra work and a decent chance of obscure breakage for nearly zero practical gain.
Python is not the fastest language there is anyway, which is ok because of its fast development cycles. Adding a "preprocessing" step for a new python project is indeed strongly discouraged.
He did not claim that this was a good idea. Technically, he is correct.
7

Actually it might be even faster to calculate, like:

x.real** 2+ x.imag** 2

Thus, the extra cost of function call will likely to diminish. Lets see:

In []: n= 1e4
In []: x= randn(n, 1)+ 1j* rand(n, 1)
In []: %timeit x.real* x.real+ x.imag* x.imag
10000 loops, best of 3: 100 us per loop
In []: %timeit x.real** 2+ x.imag** 2
10000 loops, best of 3: 77.9 us per loop

And encapsulating the calculation in a function:

In []: def abs2(x):
   ..:     return x.real** 2+ x.imag** 2
   ..: 
In []: %timeit abs2(x)
10000 loops, best of 3: 80.1 us per loop

Anyway (as other have pointed out) this kind of micro-optimization (in order to avoid a function call) is not really productive way to write python code.

2 Comments

~3us might not be a lot if you do something 100 times, or 10000. Do something a million times and you'll want to shave that
@MrMesees there is C for that
2

You can try to use lambda:

abs2 = lambda x : x.real*x.real+x.imag*x.imag

then call it by:

y = abs2(x)

1 Comment

Good thought but I just tried it... That didn't improve performance at all: def foo(bar): return bar vs foo = lambda bar: bar both execute in 57.5 nanoseconds on my system. Measured with timeit. So lambdas are exactly like regular functions and their calls. At least on CPython 3.8.
0

Python is a dynamic programming language. Luckily Python does compile to bytecode before execution. So you can inline code. For simple solutions that don't require fat external packages you can use Pythons in house functions:

from inspect import getsource

abs2 = lambda z : z.real * z.real + z.imag * z.imag

def loop (zz, zs):
  for z in zs:
    zz += abs2 (z)

print ( f"loop code:\n{getsource (loop)}" )

inlined = getsource (loop).replace ("abs2 (z)", getsource (abs2).split(":")[1] )

print ( f"inlined loop code:\n{inlined}" )

compiled = compile (inlined, '<string>', 'exec').co_code

def loop2 (zz, zs):
  for z in zs:
    zz += z.real * z.real + z.imag * z.imag

compiled2 = compile (getsource (loop2), '<string>', 'exec').co_code

print ( f"compiled loop  code: {compiled}" )
print ( f"compiled loop2 code: {compiled2}")

Note: this only supports one line lambdas with the parameters having the same name than the passed variables. A simple and very hackish solution, but Python isn't an interpreter language to not support real time code editing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.