3

I need to construct an N x M x N array A such that A[i, j, k] == (0 if i != k else x[j]). I could write:

A = np.zeros((N, M, N))
for i in range(N):
    for j in range(M):
        A[i,j,i] = x[j]

Or, alternatively:

A = np.zeros((N, M, N))
for i in range(N):
    A[i,:,i] = x

But both are most likely too slow for my purposes. Is there a faster way?

3
  • That depends a lot on the values of N and M and the dtype of A and x. Could you provide more details? Commented Jun 18, 2017 at 22:04
  • @MSeifert N and M are independent of each other. They range between 1 and 100 typically, but there's no real limit. Even though N and M are usually pretty small, this computation happens very frequently, and rarely they might be as large as 100K. A and x are both np.float32. Commented Jun 18, 2017 at 22:09
  • The problem is that array slice assignment is actually pretty fast (at least if you insert several values at once). So as long as M >> 1 you probably can't beat the second approach in the general case. If you had some limits on (or relationship between) N and M one could try to optimize it. Commented Jun 18, 2017 at 22:24

1 Answer 1

3

Approach #1

Using broadcasting to create all those linear indices where x's are to be assigned and then simply assign into its flattened view, like so -

# Initialize
Aout = np.zeros((N, M, N))

# Comput all linear indices
idx = (np.arange(N)*(N*M+1))[:,None] + N*np.arange(M)

# In a flattened view with `.ravel()` assign from x
Aout.ravel()[idx] = x

Approach #2

Using views based element access supported by np.lib.stride_tricks.as_strided -

Aout = np.zeros((N, M, N))
s0,s1,s2 = Aout.strides
Aoutview = np.lib.stride_tricks.as_strided(Aout,shape=(N,M),strides=(s0+s2,s1))
Aoutview[:] = x

Approach #3

Another approach would be to use integer array indexing along the first and third axes, closely simulating the second approach from the question, but in a vectorized manner -

Aout = np.zeros((N, M, N))
Aout[np.arange(N),:,np.arange(N)] = x

Runtime test

Approaches -

def app0(x,A):
    for i in range(N):
        for j in range(M):
            A[i,j,i] = x[j]
    return A

def app1(x,A):
    for i in range(N):
        A[i,:,i] = x
    return A

def app2(x,Aout):       
    idx = (np.arange(N)*(N*M+1))[:,None] + N*np.arange(M)
    Aout.ravel()[idx] = x
    return Aout

def app3(x,Aout):       
    s0,s1,s2 = Aout.strides
    Aoutview = np.lib.index_tricks.as_strided(Aout,shape=(N,M),strides=(s0+s2,s1))
    Aoutview[:] = x
    return Aout

def app4(x,Aout):
    r = np.arange(N)
    Aout[r,:,r] = x
    return Aout

Verification -

In [125]: # Params
     ...: N, M = 100,100
     ...: x = np.random.rand(M)
     ...: 
     ...: # Make copies of arrays to be assigned into
     ...: A0 = np.zeros((N, M, N))
     ...: A1 = np.zeros((N, M, N))
     ...: A2 = np.zeros((N, M, N))
     ...: A3 = np.zeros((N, M, N))
     ...: A4 = np.zeros((N, M, N))
     ...: 

In [126]: print np.allclose(app0(x,A0), app1(x,A1))
     ...: print np.allclose(app0(x,A0), app2(x,A2))
     ...: print np.allclose(app0(x,A0), app3(x,A3))
     ...: print np.allclose(app0(x,A0), app4(x,A4))
     ...: 
True
True
True
True

Timings -

In [127]: # Make copies of arrays to be assigned into
     ...: A0 = np.zeros((N, M, N))
     ...: A1 = np.zeros((N, M, N))
     ...: A2 = np.zeros((N, M, N))
     ...: A3 = np.zeros((N, M, N))
     ...: A4 = np.zeros((N, M, N))


In [128]: %timeit app0(x,A0)
     ...: %timeit app1(x,A1)
     ...: %timeit app2(x,A2)
     ...: %timeit app3(x,A3)
     ...: %timeit app4(x,A4)
     ...: 
1000 loops, best of 3: 1.49 ms per loop
10000 loops, best of 3: 53.6 µs per loop
10000 loops, best of 3: 150 µs per loop
10000 loops, best of 3: 28.6 µs per loop
10000 loops, best of 3: 25.2 µs per loop
Sign up to request clarification or add additional context in comments.

6 Comments

Is that really faster than the loop over range(N) and assigning to A[i, :, i]? On my computer it's just "slightly" faster (1-3%)
@MSeifert Depends on the loop iteration, I would think. What N did you use?
@MSeifert Not sure about your claims of 1-3%. I am getting much better timings on N,M = 100, 100 datasets as OP mentioned to be using.
Thanks for the runtime tests but I think you should add one using the second approach (that's the one I would use - if I would have that specific use-case).
I included the time it took to create the array (np.zeros). Without that it's app2 > app1 (second approach) > app3 (app2 being slowest). But the differences were quite small (~20%) even without np.zeros.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.