I have many large 1D arrays and I'd like to grab the unique values. Typically, one could do:
x = np.random.randint(10000, size=100000000)
np.unique(x)
However, this performs an unnecessary sort of the array. The docs for np.unique do not mention any way to retrieve the indices without sorting. Other answers with np.unique include using return_index but, as I understand it, the array is still being sorted. So, I tried using set:
set(x)
But this is way slower than sorting the array with np.unique. Is there a faster way to retrieve the unique values for this array that avoids sorting and is faster than np.unique?
pd.Series(x).unique(). Seems a bit faster.uniqueworks by sorting, and then looking for adjacent matching values. Whether you ask for the index or not, it doesn't change the basic mechanism.setuses Python's hashing (which is also used fordict). Is there some other, more efficient, approach?