is there a builtin function of Python that does on python.array what argsort() does on a numpy.array?
4 Answers
There is no built-in function, but it's easy to assemble one out of the terrific tools Python makes available:
def argsort(seq):
# http://stackoverflow.com/questions/3071415/efficient-method-to-calculate-the-rank-vector-of-a-list-in-python
return sorted(range(len(seq)), key=seq.__getitem__)
x = [5,2,1,10]
print(argsort(x))
# [2, 1, 0, 3]
It works on Python array.arrays the same way:
import array
x = array.array('d', [5, 2, 1, 10])
print(argsort(x))
# [2, 1, 0, 3]
5 Comments
Ender
Instead of using the (theoretically private) getitem, you can also use
operator.itemgetter / operator.attrgetter docs.python.org/library/operator.htmlunutbu
If
operator.itemgetter could be used as a drop-in replacement for __getitem__, I think I'd agreed with you Ender, but as far as I can see, operator.itemgetter would also require wrapping it in a lambda expression. I'd rather avoid the extra lambda if I could.Ferdinand Beyer
@Ender:
itemgetter is no use here: x.__getitem__(i) returns x[i], whereas itemgetter(x)(i) will return i[x].johannesack
In my opinion,
key=lambda i: seq[i] might be easier to understand.neonwatty
agreed with comment above (
key=lambda i: seq[i]) might be easier to read- but still great!I timed the suggestions above and here are my results.
import timeit
import random
import numpy as np
def f(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#non-lambda version by Tony Veijalainen
return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]
def g(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#lambda version by Tony Veijalainen
return [x for x,y in sorted(enumerate(seq), key = lambda x: x[1])]
def h(seq):
#http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3382369#3382369
#by unutbu
return sorted(range(len(seq)), key=seq.__getitem__)
seq = list(range(10000))
random.shuffle(seq)
n_trials = 100
for cmd in [
'f(seq)', 'g(seq)', 'h(seq)', 'np.argsort(seq)',
'np.argsort(seq).tolist()'
]:
t = timeit.Timer(cmd, globals={**globals(), **locals()})
print('time for {:d}x {:}: {:.6f}'.format(n_trials, cmd, t.timeit(n_trials)))
output
time for 100x f(seq): 0.323915
time for 100x g(seq): 0.235183
time for 100x h(seq): 0.132787
time for 100x np.argsort(seq): 0.091086
time for 100x np.argsort(seq).tolist(): 0.104226
A problem size dependent analysis is given here.
3 Comments
JPH
Interesting - probably the average is more important than the 'best' of 3(?)
Ricardo Cruz
The average is affected by outliers. You do not want the results be polluted by other programs running or hardware cache misses happenstances.
reve_etrange
For future readers,
%timeit is reporting the best average from 3 averages of 100 loops each.My alternative with enumerate:
def argsort(seq):
return [x for x,y in sorted(enumerate(seq), key = lambda x: x[1])]
seq=[5,2,1,10]
print(argsort(seq))
# Output:
# [2, 1, 0, 3]
Better though to use answer from https://stackoverflow.com/users/9990/marcelo-cantos answer to thread python sort without lambda expressions
[i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]