Remove duplicates in a list while keeping its order (Python)

Question

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'},
    {'title':'ABC Station','link':'abc.com'}

]

In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

egafni · Accepted Answer · 2014-08-29 00:59:31Z

39

use set(), then re-sort using the index of the original list.

>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']

answered Aug 29, 2014 at 0:59

egafni

1,9881 gold badge16 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user4587874 Over a year ago

This is fantastic! Exactly what I was looking for. Would you mind explaining how it works? (with the use of lambda etc) Thanks

yprez Over a year ago

Neat Python code. The downside is that it causes an extra sort, thus an unneeded O(n * log(n)) (where otherwise O(n) would be sufficient).

kaya3 Over a year ago

Worse, it calls .index for each distinct element, which does a linear search, so it's an unneeded O(n^2).

Alex Martelli · Accepted Answer · 2009-10-11 01:11:15Z

26

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

The answers of that question did not keep the "order"

my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

answered Oct 11, 2009 at 1:11

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

2 Comments

xitrium Over a year ago

Hey Alex, out of curiosity why do you put the [:] on the left hand side of the assignment? I've usually seen it on the RHS. Is it just personal preference? Looking at it at first I wasn't even sure what it would do, haha.

Mark Over a year ago

@xitrium Using [:] on the left replaced all the items in the list, instead of the list itself. It could have an effect e.g. if you do this inside a function with a list that is passed in: if you change the list it's changed outside the function, if you replace it then the outside list is unaffected). In this particular case, there are no observable effect that I can see.

Jochen Ritzel · Accepted Answer · 2009-10-11 01:19:01Z

13

Generators are great.

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )

edited Oct 11, 2009 at 1:19

answered Oct 11, 2009 at 1:11

Jochen Ritzel

108k33 gold badges205 silver badges195 bronze badges

1 Comment

Harvey Over a year ago

This is what I needed for my problem. I would suggest making it more generic adding key=lambda item: item to the method signature. Then, use key(item) for the set.

Tarnay Kálmán · Accepted Answer · 2009-10-22 21:31:38Z

3

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

edited Oct 22, 2009 at 21:31

answered Oct 11, 2009 at 0:56

Tarnay Kálmán

7,0766 gold badges48 silver badges59 bronze badges

Comments

rools · Accepted Answer · 2014-03-19 23:20:46Z

3

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):

mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]

[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]

And in the context of the answer:

[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]

answered Mar 19, 2014 at 23:20

rools

1,67515 silver badges24 bronze badges

Comments

Peter · Accepted Answer · 2009-10-11 01:08:24Z

1

dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

edited Oct 11, 2009 at 1:08

answered Oct 11, 2009 at 0:59

Peter

133k53 gold badges184 silver badges214 bronze badges

2 Comments

Alex Martelli Over a year ago

You're wrong about checking in a dict being faster than in a set (lists are a completely different matter).

Peter Over a year ago

ok, fixed, thanks. I guess set is probably implemented with a hash.

Arkistarvh Kltzuonstev · Accepted Answer · 2019-09-09 08:35:29Z

1

Try this :

list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
               if i not in uniq:
                   uniq.append(i)

print list
print uniq

output will be :

['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']

edited Sep 9, 2019 at 8:35

Arkistarvh Kltzuonstev

6,9837 gold badges32 silver badges62 bronze badges

answered Mar 5, 2012 at 18:50

falco

111 bronze badge

Comments

Greg Hewgill · Accepted Answer · 2009-10-11 00:55:38Z

0

A super easy way to do this is:

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

This is not the most efficient way, because:

it searches through the whole list for every element in the list, so it's O(n^2)
it's recursive so uses a stack depth equal to the length of the list

However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

answered Oct 11, 2009 at 0:55

Greg Hewgill

1.0m192 gold badges1.2k silver badges1.3k bronze badges

1 Comment

TIMEX Over a year ago

Can anyone come up with a way that is scalable?

ABentSpoon · Accepted Answer · 2009-10-11 00:59:49Z

0

I think using a set should be pretty efficent.

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

answered Oct 11, 2009 at 0:59

ABentSpoon

5,1891 gold badge30 silver badges24 bronze badges

1 Comment

Xavier Combelle Over a year ago

in fact it is O(n^2) because del on a list is O(n)

Collectives™ on Stack Overflow

Remove duplicates in a list while keeping its order (Python)

9 Answers 9

3 Comments

2 Comments

1 Comment

Comments

Comments

2 Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

3 Comments

2 Comments

1 Comment

Comments

Comments

2 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related