Fast value swapping in numpy array

Question

So, this is something, that should be pretty easy, but it seems to take an enormous amount of time for me: I have a numpy array with only two values (example 0 and 255) and I want to invert the matrix in that way, that all values swap (0 becomes 255 and vice versa). The matrices are about 2000³ entries big, so this is serious work! I first tried the numpy.invert method, which is not exactly what I expected. So I tried to do that myself by "storing" the values and then override them:

for i in range(array.length):
            array[i][array[i]==255]=1
            array[i][array[i]==0]=255
            array[i][array[i]==1]=0

which is behaving as expected, but taking a long time (I guess due to the for loop?). Would that be faster if I implement that as a multithreaded calculation, where every thread "inverts" a smaller sub-array? Or is there another way of doing that more conveniently?

Side note: I advise to not use array for the name of your array: NumPy users expect array to mean numpy.array. Furthermore, when you paste code into a shell after from numpy import * (or from pylab import *), your variable shadows NumPy's array. — Eric O. Lebigot
– Eric O. Lebigot, Commented Apr 9, 2013 at 12:34

Joe Kington · Accepted Answer · 2013-04-09 20:50:21Z

8

In addition to @JanneKarila's and @EOL's excellent suggestions, it's worthwhile to show a more efficient approach to using a mask to do the swap.

Using a boolean mask is more generally useful if you have a more complex comparison than simply swapping two values, but your example uses it in a sub-optimal way.

Currently, you're making multiple temporary copies of the boolean "mask" array (e.g. array[i] == blah) in your example above and performing multiple assignments. You can avoid this by just making the "mask" boolean array once and the inverting it.

If you have enough ram for a temporary copy (of bool dtype), try something like this:

mask = (data == 255)
data[mask] = 0
data[~mask] = 255

Alternately (and equivalently) you could use numpy.where:

data = numpy.where(data == 255, 0, 255)

If you were using a loop to avoid making a full temporary copy, and need to conserve ram, adjust your loop to be something more like this:

for i in range(len(array)):
     mask = (array[i] == 255)
     array[mask] = 0
     array[~mask] = 255

All that having been said, either subtraction or XOR is the way to go in this case, especially if you preform the operation in-place!

edited Apr 9, 2013 at 20:50

answered Apr 9, 2013 at 12:11

Joe Kington

287k73 gold badges621 silver badges474 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

askewchan Over a year ago

Both of these avoid making the third copy that OP had with the ==1 mask.

Janne Karila · Accepted Answer · 2013-04-09 12:17:27Z

4

To swap 0 and 255, you can use XOR if the data type is one of the integer types.

array ^= 255

edited Apr 9, 2013 at 12:17

answered Apr 9, 2013 at 12:02

Janne Karila

25.3k6 gold badges59 silver badges97 bronze badges

2 Comments

Eric O. Lebigot Over a year ago

This only works if the array contains integers, which is not obvious from the question. No downvote, though. :)

Dschoni Over a year ago

Actually, the simplest ideas are the one, that work the best! Thanks a lot!

Eric O. Lebigot · Accepted Answer · 2013-04-10 04:10:15Z

4

You can simply do:

arr_inverted = 255-arr

This converts all the elements one by one (255 gives 0 and 0 gives 255). More generally, if you only have two values a and b, the "inversion" is simply done with (a+b)-arr. This also works if the two values are not integers (like floats or complex numbers).

As Jaime pointed out, if memory is a concern subtract(255, arr, out=arr) swaps the values of arr in-place.

If you more generally have integers in your array, Janne Karila's XOR in-place solution has the advantage of being more concise than the difference in-place solution suggested above. It can be generalized as arr ^= (a^b), for swapping two integers a and b.

The execution times are similar between both methods (with a 200×200×200 array of uint8 integers, through IPython):

>>> arr = np.random.choice((0, 255), (200, 200, 200)).astype('uint8')
>>> %timeit np.bitwise_xor(255, arr, out=arr)
100 loops, best of 3: 7.65 ms per loop
>>> %timeit np.subtract(255, arr, out=arr)
100 loops, best of 3: 7.69 ms per loop

If your array is of type uint8, arr_inverted = ~a takes the same time, for swapping 0 and 255 (the ~ operator inverts all the bits), and is less general, so it's not worth it (tested with a 200×200×200 array).

edited Apr 10, 2013 at 4:10

answered Apr 9, 2013 at 12:13

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

1 Comment

Jaime Over a year ago

If memory is a concern, np.subtract(255, arr, out=arr) will do your method in-place.

that_guy · Accepted Answer · 2013-04-09 12:06:49Z

1

"I first tried the numpy.invert method, which is not exactly what I expected."

Numpy.invert is exactly what you need. Can you describe what happened? Did you use an unsigned byte for storage rather than a signed datatype or an integer?

Unsigned byte + numpy.invert should do exactly what you want.

[You should also see faster performance in numpy with unsigned bytes rather than longer or signed datatypes]

answered Apr 9, 2013 at 12:06

that_guy

192 bronze badges

3 Comments

Joe Kington Over a year ago

That's only true to uint8, not for unsigned integers in general. The OP's array probably just isn't uint8.

that_guy Over a year ago

Hello Joe. Thanks for your comment, but I actually didn't say 'unsigned integers in general', I said unsigned byte, -> i.e. uint8. If he is only storing 0 and 255, then the OP probably is using uint8. Packing them into 1s/0s makes even more sense though.

askewchan Over a year ago

@that_guy Welcome to Stack Overflow! You're right but @Joe just wanted to clarify since it likely wouldn't have been obvious to the OP (or myself, e.g.) what exactly you meant. Try to be explicit as possible in your answers, e.g. by saying "Invert should work if you're using an unsigned byte (uint8), so perhaps you're using a signed datatype or integer." BTW, for Joe to be notified, you must precede his name with @, as in @Joe.

Collectives™ on Stack Overflow

Fast value swapping in numpy array

4 Answers 4

1 Comment

2 Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related