Converting a list to a set changes element order

Question

Recently I noticed that when I am converting a list to set the order of elements is changed and is sorted by character.

Consider this example:

x=[1,2,20,6,210]
print(x)
# [1, 2, 20, 6, 210] # the order is same as initial order

set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted

My questions are -

Why is this happening?
How can I do set operations (especially set difference) without losing the initial order?

@KarlKnechtel - Yes "order is a meaningless concept for sets...in mathematics" but I have real world problems :) — d.putto
– d.putto, Commented Mar 21, 2012 at 11:32
On CPython 3.6+ unique = list(dict.fromkeys([1, 2, 1]).keys()). This works because dicts preserve insertion order now. — user3064538
– user3064538, Commented May 6, 2020 at 22:27

Brian McCutchon · Accepted Answer · 2020-05-19 17:09:27Z

214

A set is an unordered data structure, so it does not preserve the insertion order.
This depends on your requirements. If you have an normal list, and want to remove some set of elements while preserving the order of the list, you can do this with a list comprehension:
```
>>> a = [1, 2, 20, 6, 210]
>>> b = set([6, 20, 1])
>>> [x for x in a if x not in b]
[2, 210]
```
If you need a data structure that supports both fast membership tests and preservation of insertion order, you can use the keys of a Python dictionary, which starting from Python 3.7 is guaranteed to preserve the insertion order:
```
>>> a = dict.fromkeys([1, 2, 20, 6, 210])
>>> b = dict.fromkeys([6, 20, 1])
>>> dict.fromkeys(x for x in a if x not in b)
{2: None, 210: None}
```
b doesn't really need to be ordered here – you could use a set as well. Note that a.keys() - b.keys() returns the set difference as a set, so it won't preserve the insertion order.

In older versions of Python, you can use collections.OrderedDict instead:
```
>>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
>>> b = collections.OrderedDict.fromkeys([6, 20, 1])
>>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
OrderedDict([(2, None), (210, None)])
```

edited May 19, 2020 at 17:09

Brian McCutchon

8,6344 gold badges35 silver badges45 bronze badges

answered Mar 20, 2012 at 18:21

Sven Marnach

608k123 gold badges966 silver badges865 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sean Over a year ago

None object costs 16 bytes. If only there is an default OrderedSet(). :(

juanpa.arrivillaga Over a year ago

@Sean no, they do not. None is a language guaranteed singleton. In CPython, a the actual cost is just the pointer (although that cost is always there, but for a dict, you can almost consider None and other singletons or shared references "free"), so a machine word, likely 8 bytes on modern computers. But yeah, it is not as space efficient as a set could be.

user3064538 Over a year ago

On CPython 3.6+ you can just do dict.fromkeys([1, 2, 1]).keys() because regular dicts preserve order too.

user3064538 Over a year ago

@Sven I said CPython. I post this everywhere, I'm just getting tired of writing "CPython 3.6 or any other implementation starting with Python 3.7". It doesn't even matter, everyone is using CPython

Bart Hofland Over a year ago

@user3064538 (or @Boris) . . . Sven has a point. Even if CPython behaves "correctly", that's just an implementation detail. It is not guaranteed to keep behaving that way in the future. IMO you should not depend on it, unless you like unexpected surprises.

|

Tiger-222 · Accepted Answer · 2018-07-08 08:51:34Z

81

~~In Python 3.6, set() now should keep the order, but~~ there is another solution for Python 2 and 3:

>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]

edited Jul 8, 2018 at 8:51

answered Dec 29, 2016 at 11:41

Tiger-222

7,2203 gold badges52 silver badges68 bronze badges

14 Comments

Thijs van Dien Over a year ago

Two notes regarding order preservation: only as of Python 3.6, and even there, it's considered an implementation detail, so don't rely on it. Apart from that, your code is very inefficient because every time x.index is called, a linear search is performed. If you're fine with quadratic complexity, there is no reason to use a set in the first place.

Chris_Rands Over a year ago

@ThijsvanDien This is wrong, set() is not ordered in Python 3.6, not even as an implementation detail, you're thinking of dicts

Chris_Rands Over a year ago

@ThijsvanDien No they're not sorted, although sometimes appear so because ints often hash to themselves stackoverflow.com/questions/45581901/…

Igor Rodriguez Over a year ago

I cannot understand why this answer has so many upvotes, it does not keep insertion order, neither returns a set.

Tomerikoo Over a year ago

So you execute one line of code to end up with an output same as the input you started with... Why this has 70+ upvotes?

|

Sabito · Accepted Answer · 2021-01-10 08:45:08Z

56

Remove duplicates and preserve order by below function

def unique(sequence):
    seen = set()
    return [x for x in sequence if not (x in seen or seen.add(x))]

How to remove duplicates from a list while preserving order in Python

edited Jan 10, 2021 at 8:45

Sabito

5,23010 gold badges39 silver badges66 bronze badges

answered Nov 1, 2019 at 21:19

SKB

8739 silver badges16 bronze badges

4 Comments

Charles Naccio Over a year ago

Exactly what I was using set for, and this solves a main issue with using set for removing duplicates from a list; losing the original list order.

Martin Bucher Over a year ago

fantastic solution

dnk8n Over a year ago

I made an edit to return [x for x in sequence if not (tuple(x) in seen or seen.add(tuple(x)))] in my case I needed a list of lists to be unique. For example, this supports, unique([[1, 2, 3], [1, 3, 2], [1, 2, 3]]) where the original doesn't

thethiny Over a year ago

This is a fantastic answer and a new use of list comprehension that I never though of before. Anyway, it's important to keep in mind that this returns a list and not a set so you will lost the fast indexing resolution. dict.keys() is still my preferred go-to.

Michael Mior · Accepted Answer · 2018-08-22 18:03:48Z

31

Answering your first question, a set is a data structure optimized for set operations. Like a mathematical set, it does not enforce or maintain any particular order of the elements. The abstract concept of a set does not enforce order, so the implementation is not required to. When you create a set from a list, Python has the liberty to change the order of the elements for the needs of the internal implementation it uses for a set, which is able to perform set operations efficiently.

edited Aug 22, 2018 at 18:03

Michael Mior

28.9k10 gold badges95 silver badges118 bronze badges

answered Mar 20, 2012 at 18:49

lvella

13.6k13 gold badges61 silver badges121 bronze badges

Comments

pylang · Accepted Answer · 2020-03-27 17:04:02Z

26

In mathematics, there are sets and ordered sets (osets).

set: an unordered container of unique elements (Implemented)
oset: an ordered container of unique elements (NotImplemented)

In Python, only sets are directly implemented. We can emulate osets with regular dict keys (3.7+).

Given

a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}

Code

oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])

Demo

Replicates are removed, insertion-order is preserved.

list(oset)
# [1, 2, 20, 6, 210]

Set-like operations on dict keys.

oset - b
# {1, 20, 210}

oset | b
# {1, 2, 5, 6, 20, 210}

oset & b
# {2, 6}

oset ^ b
# {1, 5, 20, 210}

Details

Note: an unordered structure does not preclude ordered elements. Rather, maintained order is not guaranteed. Example:

assert {1, 2, 3} == {2, 3, 1}                    # sets (order is ignored)

assert [1, 2, 3] != [2, 3, 1]                    # lists (order is guaranteed)

One may be pleased to discover that a list and multiset (mset) are two more fascinating, mathematical data structures:

list: an ordered container of elements that permits replicates (Implemented)
mset: an unordered container of elements that permits replicates (NotImplemented)*

Summary

Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set       |    n    |    y   |     y
oset      |    y    |    y   |     n
list      |    y    |    n   |     y
mset      |    n    |    n   |     n*

^{*A multiset can be indirectly emulated with collections.Counter(), a dict-like mapping of multiplicities (counts).}

edited Mar 27, 2020 at 17:04

answered Nov 5, 2019 at 10:56

pylang

45.3k16 gold badges137 silver badges133 bronze badges

6 Comments

luisfelipe18 Over a year ago

also, there are partial ordered sets (posets)

pylang Over a year ago

And cosets, but I did not think they were germane to topic of common data structures found in the Python standard library :)

Rexovas Over a year ago

Very concisely explained. I was looking to obtain the difference of two sets and maintain the order of the remaining elements. This didn't quite work using dict.fromkeys, but it did work using OrderedDict from collections. This was using python 3.11.2.

Rexovas Over a year ago

Nevermind... it's still not maintaining the original order after subtracting elements from the other set. Maybe I'm misunderstanding how it's supposed to work.

pylang Over a year ago

@Rexovas This technique emulates properties of an oset thru the feature of the modern dict, i.e. unique, (insertion-)order elements, but in the end dict-keys are still set-like. Thus, the set operations revert to behaving like sets (unordered).

|

Alex Ricciardi · Accepted Answer · 2020-12-11 21:27:28Z

17

You can remove the duplicated values and keep the list order of insertion with one line of code, Python 3.8.2

mylist = ['b', 'b', 'a', 'd', 'd', 'c']


results = list({value:"" for value in mylist})

print(results)

>>> ['b', 'a', 'd', 'c']

results = list(dict.fromkeys(mylist))

print(results)

>>> ['b', 'a', 'd', 'c']

edited Dec 11, 2020 at 21:27

answered Dec 11, 2020 at 19:43

Alex Ricciardi

4765 silver badges15 bronze badges

3 Comments

SavindraSingh Over a year ago

This is the best one liner solution

ingyhere Over a year ago

For larger lists, this would be better off using None than an empty str. ... >>> None.__sizeof__() 16 >>> "".__sizeof__() 49 .

ingyhere Over a year ago

How? It converts the list to a dict and back to a list again in one step. Also, this works in Python 3.7+ since insertion order is now guaranteed. For large data sets using a dict exclusively would be beneficial to prevent large and multiple data structures in memory.

jsbueno · Accepted Answer · 2012-03-20 19:23:08Z

8

As denoted in other answers, sets are data structures (and mathematical concepts) that do not preserve the element order -

However, by using a combination of sets and dictionaries, it is possible that you can achieve wathever you want - try using these snippets:

# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
   new_list[x_dict[element]] = element

answered Mar 20, 2012 at 19:23

jsbueno

113k11 gold badges159 silver badges239 bronze badges

Comments

jimh · Accepted Answer · 2023-01-12 18:14:52Z

4

Building on Sven's answer, I found using collections.OrderedDict like so helped me accomplish what you want plus allow me to add more items to the dict:

import collections

x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])

If you want to add items but still treat it like a set you can just do:

z['nextitem']=None

And you can perform an operation like z.keys() on the dict and get the set:

list(z.keys())
[1, 2, 20, 6, 210]

edited Jan 12, 2023 at 18:14

answered Jan 30, 2015 at 19:43

jimh

1,9762 gold badges18 silver badges30 bronze badges

2 Comments

jxn Over a year ago

you need to do list(z.keys()) to get the list output.

jimh Over a year ago

in Python 3, yes. not in Python 2, though I should have specified.

Trees · Accepted Answer · 2022-04-01 07:50:45Z

2

Late to answer but you can use Pandas, pd.Series to convert list while preserving the order:

import pandas as pd
x = pd.Series([1, 2, 20, 6, 210, 2, 1])
print(pd.unique(x))

Output: array([ 1, 2, 20, 6, 210])

Works for a list of strings

x = pd.Series(['c', 'k', 'q', 'n', 'p','c', 'n'])
print(pd.unique(x))

Output ['c' 'k' 'q' 'n' 'p']

answered Apr 1, 2022 at 7:50

Trees

1,31313 silver badges23 bronze badges

Comments

Deepak Soni · Accepted Answer · 2022-08-08 12:39:15Z

2

One more simpler way can be two create a empty list ,let's say "unique_list" for adding the unique elements from the original list, for example:

unique_list=[]

for i in original_list:
    if i not in unique_list:
        unique_list.append(i)
    else:
        pass

This will give you all the unique elements as well as maintain the order.

edited Aug 8, 2022 at 12:39

answered Aug 8, 2022 at 12:33

Deepak Soni

193 bronze badges

Comments

AXO · Accepted Answer · 2024-02-13 05:12:27Z

I like the following solution for its conciseness. It uses PEP 448 – Additional Unpacking Generalizations which is valid in Python 3.5+, and relies on dict insertion order being preserved (CPython 3.6+ or Python 3.7+).

def unique(sequence):
    return [*dict.fromkeys(sequence)]

Also, for larger sequences, it seems to be faster than checking set membership:

from timeit import timeit


def unique(sequence):
    return [*dict.fromkeys(sequence)]


def set_unique(sequence):
    seen = set()
    return [x for x in sequence if not (x in seen or seen.add(x))]


a_large_list = [*range(100)] + [*range(10)]

assert set_unique(a_large_list) == unique(a_large_list)
print(
    timeit('set_unique(a_large_list)', globals=globals(), number=100_000)
    / timeit('unique(a_large_list)', globals=globals(), number=100_000)
)  # 2.379634026726176 times faster!

Mike Stucka · Accepted Answer · 2018-05-01 12:57:04Z

0

An implementation of the highest score concept above that brings it back to a list:

def SetOfListInOrder(incominglist):
    from collections import OrderedDict
    outtemp = OrderedDict()
    for item in incominglist:
        outtemp[item] = None
    return(list(outtemp))

Tested (briefly) on Python 3.6 and Python 2.7.

answered May 1, 2018 at 12:57

Mike Stucka

91 bronze badge

Comments

Ultrablendz · Accepted Answer · 2019-05-22 09:42:15Z

0

In case you have a small number of elements in your two initial lists on which you want to do set difference operation, instead of using collections.OrderedDict which complicates the implementation and makes it less readable, you can use:

# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
...   if not n in evens_set and not n in result:
...     result.append(n)
... 
>>> result
[1, 3, 5]

Its time complexity is not that good but it is neat and easy to read.

answered May 22, 2019 at 9:42

Ultrablendz

7131 gold badge8 silver badges14 bronze badges

Comments

Po-Yao Niu · Accepted Answer · 2020-01-24 22:51:12Z

It's interesting that people always use 'real world problem' to make joke on the definition in theoretical science.

If set has order, you first need to figure out the following problems. If your list has duplicate elements, what should the order be when you turn it into a set? What is the order if we union two sets? What is the order if we intersect two sets with different order on the same elements?

Plus, set is much faster in searching for a particular key which is very good in sets operation (and that's why you need a set, but not list).

If you really care about the index, just keep it as a list. If you still want to do set operation on the elements in many lists, the simplest way is creating a dictionary for each list with the same keys in the set along with a value of list containing all the index of the key in the original list.

def indx_dic(l):
    dic = {}
    for i in range(len(l)):
        if l[i] in dic:
            dic.get(l[i]).append(i)
        else:
            dic[l[i]] = [i]
    return(dic)

a = [1,2,3,4,5,1,3,2]
set_a  = set(a)
dic_a = indx_dic(a)

print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}

silver · Accepted Answer · 2021-08-31 17:22:12Z

0

We can use collections.Counter for this:

# tested on python 3.7
>>> from collections import Counter
>>> lst = ["1", "2", "20", "6", "210"]

>>> for i in Counter(lst):
>>>     print(i, end=" ")
1 2 20 6 210 

>>> for i in set(lst):
>>>     print(i, end=" ")
20 6 2 1 210

answered Aug 31, 2021 at 17:22

silver

113 bronze badges

1 Comment

Tomerikoo Over a year ago

Why bother with a Counter if you can just do dict.from_keys()? The only difference is that the values will be 1 instead of None, but the values are not interesting anyway as the point is to emulate a set...

abhay patil · Accepted Answer · 2022-03-18 18:28:39Z

0

You can remove the duplicated values and keep the list order of insertion, if you want

lst = [1,2,1,3]
new_lst = []

for num in lst :
    if num not in new_lst :
        new_lst.append(num)

# new_lst = [1,2,3]

don't use 'sets' for removing duplicate if 'order' is something you want,

use sets for searching i.e.
x in list
takes O(n) time
where
x in set
takes O(1) time *most cases

answered Mar 18, 2022 at 18:28

abhay patil

895 bronze badges

Comments

user8397947 · Accepted Answer · 2016-07-07 17:15:46Z

-5

Here's an easy way to do it:

x=[1,2,20,6,210]
print sorted(set(x))

edited Jul 7, 2016 at 17:15

user8397947

1,5444 gold badges23 silver badges28 bronze badges

answered Jul 7, 2016 at 16:04

Aappu Shankar

374 bronze badges

1 Comment

David Boshton Over a year ago

This doesn't preserve the ordering necessarily.

Collectives™ on Stack Overflow

Converting a list to a set changes element order

17 Answers 17

6 Comments

14 Comments

4 Comments

Comments

6 Comments

3 Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

17 Answers 17

6 Comments

14 Comments

4 Comments

Comments

6 Comments

3 Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related