1

I want to create an array whose elements are a function of their position. Something like

N = 1000000 
newarray = np.zeros([N,N,N])
for i in range(N):
    for j in range(N):
        for k in range(N):
            newarray[i,j,k] = f(i,j,k)

Is there a way to increase the speed of this operation, by removing the for loops / parallelizing it using the numpy syntax?

This is the f function

def f(i,j,k): indices = (R[:,0]==i) *( R[:,1]==j) * (R[:,2]==k) return M[indices]

where for example

R = np.random.randint(0,N,[N,3]) M = np.random.randn(N)*15

and in the actual application they are not random.

9
  • 3
    Only if you show f Commented May 21, 2019 at 12:25
  • Alternatively, if the function allows broadcasting, you can do - f(np.ogrid[:N,:N,:N]). Commented May 21, 2019 at 12:25
  • 1
    Please add a minimal reproducible example. Commented May 21, 2019 at 12:29
  • f is something like indices = (R[:,0]==i) *( R[:,1]==j) * (R[:,2]==k) return np.mean( M[indices] ) Commented May 21, 2019 at 12:30
  • @JohnBrown Use the edit to add that instead of a comment Commented May 21, 2019 at 12:31

1 Answer 1

2

You can do that operation with the at method of np.add:

import numpy as np

np.random.seed(0)
N = 100
R = np.random.randint(0, N, [N, 3])
M = np.random.randn(N) * 15
newarray = np.zeros([N, N, N])
np.add.at(newarray, (R[:, 0], R[:, 1], R[:, 2]), M)

In this case, if R has any repeated row the corresponding value in newarray will be the sum of all the corresponding values in M.

EDIT: To take the average instead of sum for repeated elements you could do something like this:

import numpy as np

np.random.seed(0)
N = 100
R = np.random.randint(0, N, [N, 3])
M = np.random.randn(N) * 15
newarray = np.zeros([N, N, N])
np.add.at(newarray, (R[:, 0], R[:, 1], R[:, 2]), M)
newarray_count = np.zeros([N, N, N])
np.add.at(newarray_count, (R[:, 0], R[:, 1], R[:, 2]), 1)
m = newarray_count > 1
newarray[m] /= newarray_count[m]
Sign up to request clarification or add additional context in comments.

4 Comments

is it possible to take the average instead of the sum?
@JohnBrown I added another possible snippet for that.
thank you, this sounds reasonable, however it is a really slow solution.
@JohnBrown For the last part for the means on the second snippet, the advanced indexing may be more expensive than just operating on the whole array. Try replacing the last two lines with np.clip(newarray_count, 1, None, out=newarray_count); newarray /= newarray_count.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.