Numpy unique 2D sub-array [duplicate]

Question

I have 3D numpy array and I want only unique 2D-sub-arrays.

Input:

[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]

 [[ 5  6]
  [ 7  8]]]

Output:

[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]

I tried convert sub-arrays to string (tostring() method) and then use np.unique, but after transform to numpy array, it deleted last bytes of \x00, so I can't transform it back with np.fromstring().

Example:

import numpy as np
a = np.array([[[1,2],[3,4]],[[5,6],[7,8]],[[9,10],[11,12]],[[5,6],[7,8]]])
b = [x.tostring() for x in a]
print(b)
c = np.array(b)
print(c)
print(np.array([np.fromstring(x) for x in c]))

Output:

[b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00', b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00', b'\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00\x0c\x00\x00\x00', b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00']
[b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'
 b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08'
 b'\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00\x0c'
 b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08']

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-86-6772b096689f> in <module>()
      5 c = np.array(b)
      6 print(c)
----> 7 print(np.array([np.fromstring(x) for x in c]))

<ipython-input-86-6772b096689f> in <listcomp>(.0)
      5 c = np.array(b)
      6 print(c)
----> 7 print(np.array([np.fromstring(x) for x in c]))

ValueError: string size must be a multiple of element size

I also tried view, but I realy don't know how to use it. Can you help me please?

This is a new feature in the upcoming 1.13, as np.unique(a, axis=0). You could simply copy the new implementation and use it in your code, since 1.13 is not released yet — Eric
– Eric, Commented Nov 18, 2016 at 10:37

Divakar · Accepted Answer · 2016-11-18 14:11:52Z

4

Using @Jaime's post, to solve our case of finding unique 2D subarrays, I came up with this solution that basically adds a reshaping to the view step -

def unique2D_subarray(a):
    dtype1 = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
    b = np.ascontiguousarray(a.reshape(a.shape[0],-1)).view(dtype1)
    return a[np.unique(b, return_index=1)[1]]

Sample run -

In [62]: a
Out[62]: 
array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]],

       [[ 5,  6],
        [ 7,  8]]])

In [63]: unique2D_subarray(a)
Out[63]: 
array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]]])

answered Nov 18, 2016 at 14:11

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Petr Over a year ago

Thank you for your answer! So if I good understood dtype specifies a sequence of bytes (not realy any type) of size a.dtype.itemsize * size of subarray? And contiguous array need, because dtype specified as a sequence of bytes? I'm so sorry for duplicate question, but I don't understand from @Jaime's post.

Divakar Over a year ago

@Peťan Well you are right about the first part. On the second part on the need of being contiguous. I am not too clear on that either. Might be worth posting a comment on that post I guess. If I have to guess, I would say your second part seems logical, but yes these two parts are related.

Eelco Hoogendoorn · Accepted Answer · 2016-11-18 10:53:33Z

2

The numpy_indexed package (disclaimer: I am its author) is designed to do operations such as these in an efficient and vectorized manner:

import numpy_indexed as npi
npi.unique(a)

answered Nov 18, 2016 at 10:53

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Comments

kezzos · Accepted Answer · 2016-11-18 11:33:38Z

1

One solution would be to use a set to keep track of which sub arrays you have seen:

seen = set([])
new_a = []

for j in a:
    f = tuple(list(j.flatten()))
    if f not in seen:
        new_a.append(j)
        seen.add(f)

print np.array(new_a)

Or using numpy only:

print np.unique(a).reshape((len(unique) / 4, 2, 2))

>>> [[[ 1  2]
      [ 3  4]]

     [[ 5  6]
      [ 7  8]]

     [[ 9 10]
      [11 12]]]

edited Nov 18, 2016 at 11:33

answered Nov 18, 2016 at 10:45

kezzos

3,2413 gold badges25 silver badges40 bronze badges

3 Comments

Eric Over a year ago

So this answer from the dupe commented above

kezzos Over a year ago

You loose the order of the sub arrays with that answer

Haroldo_OK Over a year ago

If one just copied the array to a set and then back to an array, the order would be lost, that's true, but done in the way it's done in the code above, the order won't be lost.

Collectives™ on Stack Overflow

Numpy unique 2D sub-array [duplicate]

3 Answers 3

2 Comments

Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

3 Comments

Linked

Related