1

If we have an numpy array a that needs to be sampled with replacement to create a second numpy array b,

import numpy as np

a = np.arange(10, 200*1000)
b = np.random.choice(a, len(a), replace=True)

What is the most efficient way to find an array of indexes named mapping that will transform a to b? It is OK to change np.random.choice to a more suitable function.

The following code is too slow and takes 7-8 seconds on a Macbook Pro to creating the mapping array. With an array size of 1 million, it will take much longer.

mapping = np.array([], dtype=np.int)
for n in b:
    m = np.searchsorted(a, n)
    mapping = np.append(mapping, m)
2
  • Just tried it with the software package numba, I was able to minimize the runtime 3 to 4 seconds on average. Unfortunately, the numpy append is very inefficient, whereas the append of lists is much faster. Commented Oct 14, 2020 at 22:31
  • Both np.searchsorted() and np.append() are substitutes for some looping actions. It should, indeed, to be a pain in performance if they are performed on every iteration instead of that. Commented Oct 14, 2020 at 23:56

1 Answer 1

1

Perhaps, run the choice on index of a and slice a using this random index mapping:

mapping = np.random.choice(np.arange(len(a)), len(a), replace=True)
b = a[mapping]
Sign up to request clarification or add additional context in comments.

1 Comment

Wow, your code took 0.004 secs to run on my laptop!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.