0

I am trying to calculate a vectorised nested sum

equation

(so effectively doing a separate calculation for each row k)

The fastest way I have come up with is to define a lower triangular matrix of ones to take into account the range on the inner sum

O = np.tril(np.ones((N,N),uint8),-1)

and then evaluate the inner sum with range j=1..N, allowing the use of a dot product

np.einsum('ij,ij->i',V,np.dot(W,O))

This works well, but even with uint8 data type, O requires a lot of memory for N>>10000. Is there a better way using standard numpy/scipy functions? My plan is to run this on a GPU using cupy.

2
  • That's a lot of unnecessarily allocated 1s. Pretty sure this would be easy to do with numba.cuda + cupy. Commented Aug 29, 2022 at 6:22
  • I wondwr if rhe inner sum could be pre-calculated as a cumsum Commented Aug 29, 2022 at 7:21

1 Answer 1

3

As suggested by @hpaulj, you can use cumsum to completely get rid of O. YOu can write sum W as total sum - cumsum.

O = np.sum(W,axis=1)[:,None]-np.cumsum(W,axis=1)
np.einsum('ij,ij->i',V,O)
Sign up to request clarification or add additional context in comments.

2 Comments

Very slightly more efficient: X = np.cumsum(W,axis=1); O = X[:,-1:]-X
Very nice. Yeah, the sum is the last element of cumsum, so you can avoid computing the sum

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.