4

I noticed the following behaviour exhibited by numpy arrays:

>>> import numpy as np
>>> s = {1,2,3}
>>> l = [1,2,3]
>>> np.array(l)
array([1, 2, 3])
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(l, dtype='int')
array([1, 2, 3])
>>> np.array(l, dtype='int').dtype
dtype('int64')
>>> np.array(s, dtype='int')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'

There are 2 things to notice:

  1. Creating an array from a set results in the array dtype being object
  2. Trying to specify dtype results in an error which suggests that the set is being treated as a single element rather than an iterable.

What am I missing - I don't fully understand which bit of python I'm overlooking. Set is a mutable object much like a list is.

EDIT: tuples work fine:

>>> t = (1,2,3)
>>> np.array(t)
array([1, 2, 3])
>>> np.array(t).dtype
dtype('int64')
2
  • 1
    Can you just convert the set to a list? np.array(list(s), dtype='int') Commented Oct 28, 2019 at 11:30
  • 2
    Unlike lists, sets have no order, so np.array cannot infer which element comes before or after the others. This is why the set ends up being treated as a single element. Commented Oct 28, 2019 at 11:31

4 Answers 4

8

The array factory works best with sequence objects which a set is not. If you do not care about the order of elements and know they are all ints or convertible to int, then you can use np.fromiter

np.fromiter({1,2,3},int,3)
# array([1, 2, 3])

The second (dtype) argument is mandatory; the last (count) argument is optional, providing it can improve performance.

Sign up to request clarification or add additional context in comments.

Comments

3

As you can see from the syntax of using curly brackets, a set are more closely related to a dict than to a list. You can solve it very simply by turning the set into a list or tuple before converting to an array:

>>> import numpy as np
>>> s = {1,2,3}
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(list(s))
array([1, 2, 3])
>>> np.array(tuple(s))
array([1, 2, 3])

However this might be too inefficient for large sets, because the list or tuple functions have to run through the whole set before even starting the creation of the array. A better method would be to use the set as an iterator:

>>> np.fromiter(s, int)
array([1, 2, 3])

Comments

2

The np.array documentation says that the object argument must be "an array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence" (emphasis added).

A set is not a sequence. Specifically, sets are unordered and do not support the __getitem__ method. Hence you cannot create an array from a set like you trying to with the list.

Comments

0

Numpy expects the argument to be a list, it doesn't understand the set type so it creates an object array (this would be the same if you passed any other non sequence object). You can create a numpy array with a set by first converting the set to a list numpy.array(list(my_set)). Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.