TensorFlow Dataset: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray)

Question

I am aware that similar questions have been asked perviously but none of the proposed solutions seems to work for me. I have the following Pandas Dataframe:

	Title	Author	Target	Tag0	Tag1	Tag2	Tag3	Tag4	Tag5	Tag6	Tag7	Tag8	Tag9
0	Says Ron Johnson referred to "The Lego Movie" as an "insidious anti-business conspiracy."	0	0	30	0	36	35	nan	nan	nan	nan	nan	nan
1	"Forty percent of the Fortune 500 were started either by immigrants or children of immigrants."	1	0	9	21	5	28	nan	nan	nan	nan	nan	nan

I have vectorised Title attribute by means of TextVectorization layer in Keras obtaining the following Dataframe:

	Title	Author	Target	Tag0	Tag1	Tag2	Tag3	Tag4	Tag5	Tag6	Tag7	Tag8	Tag9
0	[9415, 19483, 9066, 16820, 20256, 6959, 6931,...,0 ]	0	0	3213	3829	223	3140	nan	nan	nan	nan	nan	nan

I want to transform this Pandas dataframe to a TensorFlow dataset. I have tried to achieve this using the following code:

dataset = tf.data.Dataset.from_tensor_slices((data.values, target.values))

Here is the error I am getting:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

By removing Title column the error goes away, then Title is the column that makes the error. Title looks like this:

print(data["Title"].values)

array([array([ 9415., 19483.,  9066., 16820., 20256.,  6959.,  6931.,  8539.,
       10705.,  1342.,  1896.,  4353., 14143.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.],
       ...,
       array([17497., 20189.,  4280.,  3460., 20256., 15754.,  9178.,  1114.,
       19441., 18731., 13875., 14018.,  5789.,  6959.,  8740., 13042.,
         929.,  9541.,   773., 19384.,  5659., 13042., 14578.,  2813.,
       17452.,   888.,  6206.,  6959., 14540.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.],
      dtype=float32)], dtype=object)

My question is: What is wrong with `Title`? What should I change ?

I am assuming that is related to the data type of the numpy.ndarray containing each numpy.ndarray title. As it be can seen above dtype=object. But I am not really sure.

Thank you in advance!

Edit:

I found a work around to this issue by simply transforming the dataset to a Numpy ndarray.

# To numpy
numpy_dataset = data.to_numpy(dtype="<U43")

#Get Target
target = data.pop("Target")

#TF dataset
dataset = tf.data.Dataset.from_tensor_slices((numpy_dataset, target.values))

Each cell of the Title column is an array. values is then an array of arrays. Try np.stack(data["Title"].values). If it raises an error, those nested arrays differ in shape, and cannot be made into a 2d numeric array (which tensorflow can use). — hpaulj
– hpaulj, Commented Jan 13, 2021 at 20:53
Great that solved my problem but partially. As you can see in the code above I pass the dataframe not only Titles. If I do what you suggested, tf.data.Dataset.from_tensor_slices((np.stack(data["Title"].values), target.values)) the TensorFlow dataset is created. But how can I include the remaining columns? — GGS
– GGS, Commented Jan 13, 2021 at 21:05
Other answers here : stackoverflow.com/questions/58636087/… — Skippy le Grand Gourou
– Skippy le Grand Gourou, Commented Jan 16, 2023 at 20:24

GGS · Accepted Answer · 2021-01-16 12:08:05Z

1

I found a work around to this issue by simply transforming the dataset to a Numpy ndarray.

# To numpy
numpy_dataset = data.to_numpy(dtype="<U43")

#Get Target
target = data.pop("Target")

#TF dataset
dataset = tf.data.Dataset.from_tensor_slices((numpy_dataset, target.values))

answered Jan 16, 2021 at 12:08

GGS

1652 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sybil wu · Accepted Answer · 2021-03-19 09:51:04Z

1

I meet the same question when I try the demo of tf feature_columns.ipynb. I found the data contain null data, after drop them, the code worked

    #drop null data
     dataframe = dataframe.dropna(axis=0, how='any')

answered Mar 19, 2021 at 9:51

sybil wu

112 bronze badges

Collectives™ on Stack Overflow

TensorFlow Dataset: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray)

My question is: What is wrong with `Title`? What should I change ?

Edit:

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

My question is: What is wrong with Title? What should I change ?

Edit:

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

My question is: What is wrong with `Title`? What should I change ?