3

the thing that I'm looking for, is a function that given "a" will return "b" by the following:

a = numpy.array([1, 1, 1, 1, 5, 5, 5, 5, 5, 6, 5, 2, 2, 2, 2])

which at first 1 shows 4 times in a row, after that 5 shows 5 times, 6 shows 1 time, 5 shows 1 and 2 shows 4 times

and what will return is an array like this:

b = numpy.array([4, 5, 1, 1, 4])

the function that im looking for will treat 5 this way, even though 5 is in the array "a" 6 times in total, it will count seperately per sequence

it is a very specific, i wrote a function like this, but i want to know if there is in numpy a built-in function like this for fast perfotmance

thanks in advance

1
  • No, there is no built-in function. However, doing the consecutive count is easy enough. If you want to see a more general solution, research "run length encoding" Commented Oct 2, 2020 at 18:24

2 Answers 2

1

This can be done with bincount on cumsum of nonzero diff:

out = np.bincount((np.diff(a)!=0).cumsum())
out[0] += 1

Output:

array([4, 5, 1, 1, 4])
Sign up to request clarification or add additional context in comments.

Comments

0

You can also use additional attributes of np.diff to create an array of differences with extra units in both ends added artificially:

>>> np.diff(a,prepend=a[0]-1,append=a[-1]+1)
array([ 1,  0,  0,  0,  4,  0,  0,  0,  0,  1, -1, -3,  0,  0,  0,  1])

Now this is ready for combination of np.diff and np.nonzero:

x = np.diff(a, prepend=a[0]-1, append=a[-1]+1)
np.diff(np.nonzero(x))

Output:

array([[4, 5, 1, 1, 4]], dtype=int32)

But this is a little bit slower: 3x slower for small array a and 25% slower for large array a = np.random.randint(3,size=10000000).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.