7

I have 3D numpy array and I want only unique 2D-sub-arrays.

Input:

[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]

 [[ 5  6]
  [ 7  8]]]

Output:

[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]

I tried convert sub-arrays to string (tostring() method) and then use np.unique, but after transform to numpy array, it deleted last bytes of \x00, so I can't transform it back with np.fromstring().

Example:

import numpy as np
a = np.array([[[1,2],[3,4]],[[5,6],[7,8]],[[9,10],[11,12]],[[5,6],[7,8]]])
b = [x.tostring() for x in a]
print(b)
c = np.array(b)
print(c)
print(np.array([np.fromstring(x) for x in c]))

Output:

[b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00', b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00', b'\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00\x0c\x00\x00\x00', b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00']
[b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'
 b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08'
 b'\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00\x0c'
 b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08']

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-86-6772b096689f> in <module>()
      5 c = np.array(b)
      6 print(c)
----> 7 print(np.array([np.fromstring(x) for x in c]))

<ipython-input-86-6772b096689f> in <listcomp>(.0)
      5 c = np.array(b)
      6 print(c)
----> 7 print(np.array([np.fromstring(x) for x in c]))

ValueError: string size must be a multiple of element size

I also tried view, but I realy don't know how to use it. Can you help me please?

1
  • 1
    This is a new feature in the upcoming 1.13, as np.unique(a, axis=0). You could simply copy the new implementation and use it in your code, since 1.13 is not released yet Commented Nov 18, 2016 at 10:37

3 Answers 3

4

Using @Jaime's post, to solve our case of finding unique 2D subarrays, I came up with this solution that basically adds a reshaping to the view step -

def unique2D_subarray(a):
    dtype1 = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
    b = np.ascontiguousarray(a.reshape(a.shape[0],-1)).view(dtype1)
    return a[np.unique(b, return_index=1)[1]]

Sample run -

In [62]: a
Out[62]: 
array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]],

       [[ 5,  6],
        [ 7,  8]]])

In [63]: unique2D_subarray(a)
Out[63]: 
array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]]])
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer! So if I good understood dtype specifies a sequence of bytes (not realy any type) of size a.dtype.itemsize * size of subarray? And contiguous array need, because dtype specified as a sequence of bytes? I'm so sorry for duplicate question, but I don't understand from @Jaime's post.
@Peťan Well you are right about the first part. On the second part on the need of being contiguous. I am not too clear on that either. Might be worth posting a comment on that post I guess. If I have to guess, I would say your second part seems logical, but yes these two parts are related.
2

The numpy_indexed package (disclaimer: I am its author) is designed to do operations such as these in an efficient and vectorized manner:

import numpy_indexed as npi
npi.unique(a)

Comments

1

One solution would be to use a set to keep track of which sub arrays you have seen:

seen = set([])
new_a = []

for j in a:
    f = tuple(list(j.flatten()))
    if f not in seen:
        new_a.append(j)
        seen.add(f)

print np.array(new_a)

Or using numpy only:

print np.unique(a).reshape((len(unique) / 4, 2, 2))

>>> [[[ 1  2]
      [ 3  4]]

     [[ 5  6]
      [ 7  8]]

     [[ 9 10]
      [11 12]]]

3 Comments

So this answer from the dupe commented above
You loose the order of the sub arrays with that answer
If one just copied the array to a set and then back to an array, the order would be lost, that's true, but done in the way it's done in the code above, the order won't be lost.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.