1

I have an numpy.ndarray as given below:

x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
labels = [1,0]
df = pd.DataFrame({"a":x,"labels":labels})
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-458-79198b72cdcb> in <module>()
      1 x = np.array([[1, 2, 3], [4, 5, 6]], np.int32).reshape(-1,1)
      2 labels = [1,0,1,0]
----> 3 df = pd.DataFrame({"a":x,"labels":labels})

4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

I tried to reshape the np.ndarray by x.reshape(-1,1) but the result didn't change. Each of the lists in ndarray x must be a row in the dataframe. I'm expecting to get:

           a  labels
0  [1, 2, 3]       1
1  [4, 5, 6]       0
1
  • While you create x from a list of lists, x itself does not contain lists. It is a 2d array containing numbers. x=[[1, 2, 3], [4, 5, 6]] works. Commented Apr 17, 2020 at 16:04

1 Answer 1

2

The problem is that since a is a multidimensional, homogeneous array, pandas doesn't know how to split it into several rows. In general pandas does not support embedded structures. Think about the case with a higher dimensional array as (3,4,2), how should this be dealt with?

Note that the dataframe columns are created through separate calls to the pd.Series constructor. By directly trying to construct a series from the ndarray, we get the same explicit error:

pd.Series(x)
    ...
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)
Exception: Data must be 1-dimensional

So you have to turn the array into an iterable, where each of its values will be a row of the dataframe. For that you could unpack the numpy array's values into separate lists:

df = pd.DataFrame({"a":[*x], "labels":labels}) # or .."a":list(x)..

print(df)
           a  labels
0  [1, 2, 3]       1
1  [4, 5, 6]       0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.