2

I'm writing a code to fit a 2D function in (5, 5)-shaped numpy arrays. The fitting is done by maximizing with scipy.optimize.minimize, so a lot of functions get called a tremendous amount of times during the iterative process.

These are the first cProfile results:

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1392579   60.849    0.000   60.849    0.000 maxima.py:288(derfs)
1392578   34.243    0.000   34.243    0.000 maxima.py:279(dexp)
696289   31.186    0.000  152.278    0.000 maxima.py:322(ll_jac)
1392578   26.670    0.000   26.670    0.000 maxima.py:283(derf)
3639673   23.772    0.000   23.772    0.000 {method 'reduce' of 'numpy.ufunc' objects}
696290   10.440    0.000   53.965    0.000 maxima.py:308(logll)
1392579    8.882    0.000   69.731    0.000 maxima.py:295(lambda_g)
13202    5.216    0.000  215.145    0.016 lbfgsb.py:198(_minimize_lbfgsb)
3494647    4.468    0.000   32.576    0.000 fromnumeric.py:1621(sum)

And these are the functions

import numpy as np
from scipy.special import erf

def dexp(x0, sigma, x=np.arange(5)):
    xx1 = (x + 1 - x0)/sigma
    xx = (x - x0)/sigma
    return np.exp(-xx1*xx1) - np.exp(-xx*xx)

def derf(x0, sigma, x=np.arange(5)):
    return erf((x + 1 - x0) / sigma) - erf((x - x0) / sigma)

def derfs(x0, y0, sigma, xy=np.arange(5)):
    i = erf((xy + 1 - x0) / sigma) - erf((xy - x0) / sigma)
    j = erf((xy + 1 - y0) / sigma) - erf((xy - y0) / sigma)
    return i[:, np.newaxis] * j

def lambda_g(x0, y0, fwhm, factor=0.09*np.pi, f2=0.6):
    return factor * fwhm**2 * derfs(x0, y0, fwhm * f2)

Objective function and its gradient, that needs some optimizing as well:

def logll(parameters, *args):
    """ Log-likelihood function
    """
    A, x0, y0, bkg = parameters
    fwhm, area = args

    lambda_p = A * lambda_g(x0, y0, fwhm) + bkg
    return -np.sum(area * np.log(lambda_p) - lambda_p)

def ll_jac(parameters, *args):
    """ This is the Jacobian of the log-likelihood function
    """
    A, x0, y0, bkg = parameters
    fwhm, area = args

    derfx = derf(x0, fwhm*0.6)
    derfy = derf(y0, fwhm*0.6)
    lambda1 = lambda_g(x0, y0, fwhm)
    factor = 1 - area/(A * lambda1 + bkg)

    jac = np.zeros(4)
    # dL/d(A)
    # The derivative of lambda_g is lambda_g(A=1)
    jac[0] = np.sum(factor*lambda1)
    # dL/d(x0) and dL/d(y0)
    jac12 = -0.3*A*fwhm*np.sqrt(np.pi)
    jac[1] = jac12*np.sum(dexp(x0, fwhm * 0.6)[:, np.newaxis] * derfy*factor)
    jac[2] = jac12*np.sum(dexp(y0, fwhm * 0.6)[:, np.newaxis] * derfx*factor)
    # dL/d(bkg)
    jac[3] = np.sum(factor)

    return jac

On my PC,

%timeit dexp(2.3, 2.)
10000 loops, best of 3: 16 µs per loop

%timeit derf(2.3, 2.)
100000 loops, best of 3: 14 µs per loop

%timeit derfs(2.3, 2.2, 2.)
10000 loops, best of 3: 32.2 µs per loop

Is it possible to speed these up? What do you recommend me to do? cython?numba? Is there anything else I can try before diving into those modules?

6
  • For starters i would change your ** 2 to x*x (see stackoverflow.com/questions/1019740/…) Commented Aug 24, 2015 at 22:17
  • Can you post your objective function as well? Also, I wouldn't worry about replacing the squares with x * x, as it is handled by numpy and I'd guess it's pretty optimized. Commented Aug 24, 2015 at 22:25
  • Thank you both, I'm updating the question taking your suggestions into account Commented Aug 24, 2015 at 22:29
  • I don't think cython or numba will help because you aren't doing iterations in these functions. They are calling np.exp and erf which are already compiled code. Commented Aug 24, 2015 at 22:52
  • I agree, but the sum of np.exps and erfs are not compiled... right? Would it make any difference? Commented Aug 24, 2015 at 22:54

1 Answer 1

3

You can try to perform less operations. For instance you could do the following:

def dexp2(x0, sigma, x=np.arange(5)):
    a = (x - x0) / sigma
    return np.exp(-(a + 1/sigma)**2) - np.exp(-(a*a))

~35%-40% better.

def derf2(x0, sigma, x=np.arange(5)):
    a = (x - x0) / sigma
    return erf(a + 1 / sigma) - erf(a)

~50% better.

def derfs2(x0, y0, sigma, xy=np.arange(5)):
    a = (xy - x0) / sigma
    b = (xy - y0) / sigma
    i = erf(a + 1 / sigma) - erf(a)
    j = erf(b + 1 / sigma) - erf(b)
    return i[:, np.newaxis] * j

~50% better.

These are just micro optimizations. In general, if your inputs have this size I think numba and/or cython can't help too much. Maybe it is better to check which optimization method of the available ones suits better for your special case and try to initialise it in a clever way.

Sign up to request clarification or add additional context in comments.

2 Comments

With your suggestions and changing the optimization method to Newton-CG I reduced computation time 10-fold!
Great. Something similar happened to me some time ago with an optimization.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.