Optimize operations with arrays in numpy

Question

I have to apply some mathematical formula that I've written in python as:

    for s in range(tdim):
        sum1 = 0.0
        for i in range(dim):
            for j in range(dim):
                sum1+=0.5*np.cos(theta[s]*(i-j))*
                eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta[s]*(i-j))*eig1[j]*eig2[i]-eig1[i]*eig2[j])

        PHi2.append(sum1)

Now, this is correct, but clearly inefficient, the other way around is to do:

for i in range(dim):
            for j in range(dim):
                PHi2 = 0.5*np.cos(theta*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j])

However, the second example gives me the same number in all elements of PHi2, so this is faster but answer is wrong. How can you do this correctly and more efficiently?

NOTE: eig1 and eig2 are of the same dimension d, theta and PHi2 are the same dimension D, BUT d!=D.

Jaime · Accepted Answer · 2014-01-17 00:30:59Z

You can use a brute force broadcasting approach, but you are creating an intermediate array of shape (D, d, d), which can get out of hand if your arrays are even moderately large. Furthermore, in using broadcasting with no refinements you are recomputing a lot of calculations from the innermost loop that you only need to do once. If you first compute the necessary parameters for all possible values of i - j and add them together, you can reuse those values on the outer loop, e.g.:

def fast_ops(eig1, eig2, theta):
    d = len(eig1)
    d_arr = np.arange(d)
    i_j = d_arr[:, None] - d_arr[None, :]
    reidx = i_j + d - 1
    mult1 = eig1[:, None] * eig1[ None, :] + eig2[:, None] + eig2[None, :]
    mult2 = eig1[None, :] * eig2[:, None] - eig1[:, None] * eig2[None, :]
    mult1_reidx = np.bincount(reidx.ravel(), weights=mult1.ravel())
    mult2_reidx = np.bincount(reidx.ravel(), weights=mult2.ravel())

    angles = theta[:, None] * np.arange(1 - d, d)

    return 0.5 * (np.einsum('ij,j->i', np.cos(angles), mult1_reidx) -
                  np.einsum('ij,j->i', np.sin(angles), mult2_reidx))

IF we rewrite M4rtini's code as a function for comparison:

def fast_ops1(eig1, eig2, theta):
    d = len(eig1)
    D = len(theta)
    s = np.array(range(D))[:, None, None]
    i = np.array(range(d))[:, None]
    j = np.array(range(d))
    ret = 0.5 * (np.cos(theta[s]*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j]) -
                 np.sin(theta[s]*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j]))
    return ret.sum(axis=(-1, -2))

And we make up some data:

d, D = 100, 200
eig1 = np.random.rand(d)
eig2 = np.random.rand(d)
theta = np.random.rand(D)

The speed improvement is very noticeable, 80x on top of the 115x over your original code, leading to a whooping 9000x speed-up:

In [22]: np.allclose(fast_ops1(eig1, eig2, theta), fast_ops(eig1, eig2, theta))
Out[22]: True

In [23]: %timeit fast_ops1(eig1, eig2, theta)
10 loops, best of 3: 145 ms per loop

In [24]: %timeit fast_ops(eig1, eig2, theta)
1000 loops, best of 3: 1.85 ms per loop

M4rtini · Accepted Answer · 2014-01-16 23:58:22Z

1

This works by broadcasting.
For tdim = 200 and dim = 100.
14 seconds with original.
120 ms with the version.

s = np.array(range(tdim))[:, None, None]
i = np.array(range(dim))[:, None]
j = np.array(range(dim))
PHi2 =(0.5*np.cos(theta[s]*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta[s]*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j])).sum(axis=2).sum(axis=1)

answered Jan 16, 2014 at 23:58

M4rtini

13.6k4 gold badges37 silver badges42 bronze badges

Comments

Edward · Accepted Answer · 2014-01-16 23:11:30Z

0

In the first bit of code, you have 0.5*np.cos(theta[s]*(i-j))... but in the second it's 0.5*np.cos(theta*(i-j)).... Unless you've got theta defined differently for the second bit of code, this could well be the cause of the trouble.

answered Jan 16, 2014 at 23:11

Edward

7,1402 gold badges31 silver badges60 bronze badges

1 Comment

user2820579 Over a year ago

this is true, because in the second case I'm considering the whole list named theta being operated by the multiplication and the cosine. Since I want that each element of theta be considered while I'm running the sums over i and j.

Collectives™ on Stack Overflow

Optimize operations with arrays in numpy

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related