1

This question is similar to this one.

I have a 2d boolean array "belong" and a 2d float array "angles". What I want is to sum along the rows the angles for which the corresponding index in belong is True, and do that with numpy (ie. avoid python loops). I don't need to store the resulting rows, which would have different lengths and as explained in the linked question would require a list.

So what I attempted is np.sum(angles[belong] ,axis=1), but angles[belong] returns a 1d result, and I can't reduce it as I want. I have also tried np.sum(angles*belong ,axis=1) and that works. But I wonder if I could improve the timing by accessing only the indexes where belong is True. belong is True about 30% of the time and angles is a simplification of a longer formula which involves angles.

UPDATE

I like the solution with einsum, however in my actual computation the speed up is tiny. I used angles in the question to simplify, in practice it is a formula that uses angles. I suspect that this formula is calculated for all the angles (regardless of belong) and then passed to einsum, which would perform the computation.

This is what I've done:

THRES_THETA and max_line_length are floats. belong, angle and lines_lengths_vstacked have shape (1653, 58) and np.count_nonzero(belong)/belong.size -> 0.376473287856979

 l2 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
      np.sum(belong*(0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)), axis=1)) #base method
t2 = timeit.Timer(l2)
print(t2.repeat(3, 100))

l1 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
    np.einsum('ij,ij->i', belong, 0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)))
t1 = timeit.Timer(l1)
print(t1.repeat(3, 100))

l3 = (lambda angle=angle, belong=belong:
    np.sum(angle*belong ,axis=1)) #base method
t3 = timeit.Timer(l3)
print(t3.repeat(3, 100))

l4 = (lambda angle=angle, belong=belong:
    np.einsum('ij,ij->i', belong, angle))
t4 = timeit.Timer(l4)
print(t4.repeat(3, 100))

and the results were:

[0.2505458095931187, 0.22666162878242901, 0.23591678551324263]
[0.23295411847036418, 0.21908727226505043, 0.22407296178704272]
[0.03711204915708555, 0.03149960399994978, 0.033403337575027114]
[0.025264803208228992, 0.022590580646423053, 0.024585736455331464]

If we look at the last two rows, the one corresponding to einsum is about 30% faster than using the base method. But if we look at the first two rows, the speed up for the einsum method is smaller, just about 0.1% faster.

I'm not sure if this timing can be improved.

3 Answers 3

2

You can use np.einsum -

np.einsum('ij,ij->i',belong,angles)

You can also use np.bincount, like so -

idx,_ = np.where(belong)
out = np.bincount(idx,angles[belong])

Sample run -

In [32]: belong
Out[32]: 
array([[ True,  True,  True, False,  True],
       [False, False, False,  True,  True],
       [False, False,  True,  True,  True],
       [False, False,  True, False,  True]], dtype=bool)

In [33]: angles
Out[33]: 
array([[ 0.65429151,  0.36235607,  0.98316406,  0.08236384,  0.5576149 ],
       [ 0.37890797,  0.60705112,  0.79411002,  0.6450942 ,  0.57750073],
       [ 0.6731019 ,  0.18608778,  0.83387574,  0.80120389,  0.54971573],
       [ 0.18971255,  0.86765132,  0.82994543,  0.62344429,  0.05207639]])

In [34]: np.sum(angles*belong ,axis=1) # This worked for you, so using as baseline
Out[34]: array([ 2.55742654,  1.22259493,  2.18479536,  0.88202183])

In [35]: np.einsum('ij,ij->i',belong,angles)
Out[35]: array([ 2.55742654,  1.22259493,  2.18479536,  0.88202183])

In [36]: idx,_ = np.where(belong)
    ...: out = np.bincount(idx,angles[belong])
    ...: 

In [37]: out
Out[37]: array([ 2.55742654,  1.22259493,  2.18479536,  0.88202183])

Runtime test -

In [52]: def sum_based(belong,angles):
    ...:     return np.sum(angles*belong ,axis=1)
    ...: 
    ...: def einsum_based(belong,angles):
    ...:     return np.einsum('ij,ij->i',belong,angles)
    ...: 
    ...: def bincount_based(belong,angles):
    ...:     idx,_ = np.where(belong)
    ...:     return np.bincount(idx,angles[belong])
    ...: 

In [53]: # Inputs
    ...: belong = np.random.rand(4000,5000)>0.7
    ...: angles = np.random.rand(4000,5000)
    ...: 

In [54]: %timeit sum_based(belong,angles)
    ...: %timeit einsum_based(belong,angles)
    ...: %timeit bincount_based(belong,angles)
    ...: 
1 loops, best of 3: 308 ms per loop
10 loops, best of 3: 134 ms per loop
1 loops, best of 3: 554 ms per loop

I would go with the np.einsum one!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer. The einsum approach looks very interesting. However, in the actual calculation I'm doing the speed gain of using einsum vs based is tiny. I think it may be because instead of angles I have a larger operation that has to be computed anyway for the whole matrix before going into einsum. I'll edit my question to show my test results with einsum and the actual formula involving angles
1

You could use masked arrays for this, but in the tests I ran it is not faster than (angles * belong).sum(1).

A masked array approach would look like this:

sum_ang = np.ma.masked_where(~belong, angles, copy=False).sum(1).data

Here, we are creating a masked array of angles where the values ~belong ("not belong"), are masked (excluded). We take the not because we want to exclude the values in belong that are False. Then take the sum along rows .sum(1). The sum will return another masked array, so you grab the values with the .data attribute of that masked array.

I added the copy=False kwarg so that this code doesn't get slowed down by array creation, but it's still slower than your (angles * belong).sum(1) approach so you should probably just stick with that.

Comments

0

I have found a way that is about 3 times faster than the einsum solution, and I don't think it can get any faster, so I'm answering my own question with this other method.

What I was hoping is to calculate the formula involving angles just for the positions where belong is True. This should speed up about 3 times as belong is True about 30% of the time.

My first attempt using angles[belong] would calculate the formula just for the positions where belong is True, but had the problem that the resulting array was 1d and I couldn't do the row reductions with np.sum. The solution is to use np.add.reduceat.

reduceat can reduce an ufunc (in this case add) at a list of specific slices. So I just need to create that list of slices so that I can reduce the 1d array resulting from angles[belong].

I'll show my code and timings and that should speak by itself.

first I define a function with the reduceat solution:

def vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length):
    intermediate = (0.3 * (1-(angle[belong]/THRES_THETA)) + 0.7 * (lines_lengths_vstacked[belong]/max_line_length))
    b_ind = np.hstack([0, np.cumsum(np.sum(belong, axis=1))])
    votes = np.add.reduceat(intermediate, b_ind[:-1])
    return votes

then I compare with the base method and the einsum method:

l1 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
      np.sum(belong*(0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)), axis=1))
t1 = timeit.Timer(l1)
print(t1.repeat(3, 100))

l2 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
    np.einsum('ij,ij->i', belong, 0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)))
t2 = timeit.Timer(l2)
print(t2.repeat(3, 100))

l3 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
      vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length))
t3 = timeit.Timer(l3)
print(t3.repeat(3, 100))

and the timings:

[2.866840408487671, 2.6822349628234874, 2.665520338478774]
[2.3444239421490725, 2.352450520946098, 2.4150879511222794]
[0.6846337313820605, 0.660780839464234, 0.6091473217964847]

So the reduceat solution is about 3 times faster and gives the same results as the other two. Note that these results are for a slightly larger example than before where: belong, angle and lines_lengths_vstacked have shape: (3400, 170) and np.count_nonzero(belong)/belong.size->0.16765051903114186

Update Due to a corner case in np.reduceat (as in numpy version '1.11.0rc1') where it can't handle repeated indices correctly, see, I had to add a hack to vote_op() function for the case where there are whole rows in belong that are False. This results in repeated indices in b_ind and wrong results in votes. My solution for the moment is to patch the wrong values, that works but is another step. see new vote_op():

def vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length):
    intermediate = (0.3 * (1-(angle[belong]/THRES_THETA)) + 0.7 * (lines_lengths_vstacked[belong]/max_line_length))
    b_rows = np.sum(belong, axis=1)
    b_ind = np.hstack([0, np.cumsum(b_rows)])[:-1]
    intermediate = np.hstack([intermediate, 0])
    votes = np.add.reduceat(intermediate, b_ind)
    votes[b_rows == 0] = 0
    return votes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.