2

I have two numpy arrays, x and y (the length are around 2M). The x are ordered, but some of the values are identical.

The task is to remove values for both x and y when the values in x are identical. My idea is to create a mask. Here is what I have done so far:

def createMask(x):
  idx = np.empty(x.shape, dtype=bool)
  for i in xrange(len(x)-1):
    if x[i+1] == x[i]:
      idx[i] = False

  return idx

idx = createMask(x)
x   = x[idx]
y   = y[idx]

This method works fine, but it is slow (705ms with %timeit). Also I think this look really clumpsy. Is there are more elegant and efficient way (I'm sure there is).

Updated with best answer

The second method is

idx = [x[i+1] == x[i] for i in xrange(len(x)-1)]

And the third (and fastest) method is

idx = x[:-1] == x[1:]

The results are (using ipython's %timeit):

First method: 751ms

Second method: 618ms

Third method: 3.63ms

Credit to mtitan8 for both methods.

1 Answer 1

4

I believe the fastest method is to compare x using numpy's == array operator:

idx = x[:-1] == x[1:]

On my machine, using x with a million random integers in [0, 100],

In[15]: timeit idx = x[:-1] == x[1:]
1000 loops, best of 3: 1 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

There is indeed some speedup. Went from 708ms to 559ms. Thank you. I'm still looking for more speedup.
I think this is more than sufficient. Thank you. I have implemented both methods in my original question with perfomance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.