More efficient nested sum in numpy

Question

I am trying to calculate a vectorised nested sum

$\sum_{i=1}^{N-1}V_{ki}\sum_{j=i+1}^{N}W_{kj}$

(so effectively doing a separate calculation for each row k)

The fastest way I have come up with is to define a lower triangular matrix of ones to take into account the range on the inner sum

O = np.tril(np.ones((N,N),uint8),-1)

and then evaluate the inner sum with range j=1..N, allowing the use of a dot product

np.einsum('ij,ij->i',V,np.dot(W,O))

This works well, but even with uint8 data type, O requires a lot of memory for N>>10000. Is there a better way using standard numpy/scipy functions? My plan is to run this on a GPU using cupy.

That's a lot of unnecessarily allocated 1s. Pretty sure this would be easy to do with numba.cuda + cupy. — dsm
– dsm, Commented Aug 29, 2022 at 6:22
I wondwr if rhe inner sum could be pre-calculated as a cumsum — hpaulj
– hpaulj, Commented Aug 29, 2022 at 7:21

user10289025 · Accepted Answer · 2022-08-29 14:34:58Z

3

As suggested by @hpaulj, you can use cumsum to completely get rid of O. YOu can write sum W as total sum - cumsum.

O = np.sum(W,axis=1)[:,None]-np.cumsum(W,axis=1)
np.einsum('ij,ij->i',V,O)

answered Aug 29, 2022 at 14:34

user10289025

Sign up to request clarification or add additional context in comments.

2 Comments

Will Over a year ago

Very slightly more efficient: X = np.cumsum(W,axis=1); O = X[:,-1:]-X

user10289025 Over a year ago

Very nice. Yeah, the sum is the last element of cumsum, so you can avoid computing the sum

Collectives™ on Stack Overflow

More efficient nested sum in numpy

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related