4

I wants to get unique in numpy 2D array but the array is like this:

a = np.array([[1,2,3], [2,3], [1]])
np.unique(a)

So, the array have different number of elements and I wanted the flatten array with unique elements like this:

[1,2,3]

But "np.unique" not working as expected.

1
  • It's not a 2d array. It's a 1d array of lists (look at shape and dtype). Commented Jul 10, 2018 at 19:11

2 Answers 2

5

You have an object type array due to the different lengths of inner lists, np.unique will compare objects (inner lists) against each other instead of the elements; You need to manually flatten the array using np.concatenate in a 1d array and then use np.unique:

np.unique(np.concatenate(a))
# array([1, 2, 3])
Sign up to request clarification or add additional context in comments.

Comments

1

Another way is to flatten the list using itertools.chain and then use np.unique(). This can be faster than np.concatenate() if you have a very large list.

For example, consider the following:

First generate random data:

from itertools import chain
import numpy as np
import pandas as pd

N = 100000
a = np.array(
    [[np.random.randint(0,1000) for _ in range(np.random.randint(0,10))] for _ in range(N)]
)

Timing results:

%%timeit
np.unique(list(chain.from_iterable(a)))
#10 loops, best of 3: 66.7 ms per loop

%%timeit
np.unique(np.concatenate(a))
#10 loops, best of 3: 123 ms per loop

You could also use pandas.unique, which according to the docs:

Significantly faster than numpy.unique. Includes NA values.

%%timeit
pd.unique(np.concatenate(a))
#10 loops, best of 3: 107 ms per loop

%%timeit
pd.unique(list(chain.from_iterable(a)))
#10 loops, best of 3: 57.2 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.