1

I'm trying to sort two large four dimensional arrays in numpy.

I want to sort based on the values axis 2 of the first array, and sort the second array by the same indices. All other axes should remain in the same order for both arrays.

The following code does what I want, but relies on looping in python, so it's slow. The arrays are quite large, so I'd really like to get this working using compiled numpy operations for performance reasons. Or some other means of getting this block of code to be compiled (Cython?).

import numpy as np

data = np.random.rand(10,6,4,1)
data2 = np.random.rand(10,6,4,3)

print data[0,0,:,:]
print data2[0,0,:,:]

for n in range(data.shape[0]):
  for m in range(data.shape[1]):

    sort_ids = np.argsort(data[n,m,:,0])

    data[n,m,:,:] = data[n,m,sort_ids,:]
    data2[n,m,:,:] = data2[n,m,sort_ids,:]


print data[0,0,:,:]
print data2[0,0,:,:]
1

2 Answers 2

1

Maybe there is a better solution but this should work:

sort_ids = np.argsort(data,axis=2)

s1 = data.shape
s2 = data2.shape
d1 = data[np.arange(s1[0])[:,None,None,None],np.arange(s1[1])[None,:,None,None],sort_ids,np.arange(s1[3])[None,None,None,:]]
d2 = data2[np.arange(s2[0])[:,None,None,None],np.arange(s2[1])[None,:,None,None],sort_ids,np.arange(s2[3])[None,None,None,:]]

At least the output is identical to your code.

Sign up to request clarification or add additional context in comments.

1 Comment

this looks like it does the same basic thing as the solution I came up with, but it doesn't have the memory overhead. Thanks!!
0

Found a way to make this work. It requires storing an index array, which may cause some memory issues for me, but it's way faster. Example code with timing comparison:

import numpy as np
import time

loops = 1000

data = np.random.rand(100,6,4,1)
data2 = np.random.rand(100,6,4,3)

start = time.time()
for n in range(loops):


  idxs = np.indices(data.shape)
  idxs2 = np.indices(data2.shape)

  sort_ids = np.argsort(data, 2)

  sorted_data = data[idxs[0], idxs[1], sort_ids, idxs[3]]
  sorted_data2 = data2[idxs2[0], idxs2[1], np.repeat(sort_ids, data2.shape[3], 3), idxs2[3]]

print 'Time Elapsed: %5.2f seconds' % (time.time() - start)



start = time.time()
for n in range(loops):

  sorted_data = np.zeros(data.shape)
  sorted_data2 = np.zeros(data2.shape)

  for n in range(data.shape[0]):
    for m in range(data.shape[1]):

      sort_ids = np.argsort(data[n,m,:,0])

      data[n,m,:,:] = data[n,m,sort_ids,:]
      data2[n,m,:,:] = data2[n,m,sort_ids,:]


print 'Time Elapsed: %5.2f seconds' % (time.time() - start)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.