2

I am aware that similar questions have been asked perviously but none of the proposed solutions seems to work for me. I have the following Pandas Dataframe:

Title Author Target Tag0 Tag1 Tag2 Tag3 Tag4 Tag5 Tag6 Tag7 Tag8 Tag9
0 Says Ron Johnson referred to "The Lego Movie" as an "insidious anti-business conspiracy." 0 0 30 0 36 35 nan nan nan nan nan nan
1 "Forty percent of the Fortune 500 were started either by immigrants or children of immigrants." 1 0 9 21 5 28 nan nan nan nan nan nan

I have vectorised Title attribute by means of TextVectorization layer in Keras obtaining the following Dataframe:

Title Author Target Tag0 Tag1 Tag2 Tag3 Tag4 Tag5 Tag6 Tag7 Tag8 Tag9
0 [9415, 19483, 9066, 16820, 20256, 6959, 6931,...,0 ] 0 0 3213 3829 223 3140 nan nan nan nan nan nan

I want to transform this Pandas dataframe to a TensorFlow dataset. I have tried to achieve this using the following code:

dataset = tf.data.Dataset.from_tensor_slices((data.values, target.values))

Here is the error I am getting:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

By removing Title column the error goes away, then Title is the column that makes the error. Title looks like this:

print(data["Title"].values)
array([array([ 9415., 19483.,  9066., 16820., 20256.,  6959.,  6931.,  8539.,
       10705.,  1342.,  1896.,  4353., 14143.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.],
       ...,
       array([17497., 20189.,  4280.,  3460., 20256., 15754.,  9178.,  1114.,
       19441., 18731., 13875., 14018.,  5789.,  6959.,  8740., 13042.,
         929.,  9541.,   773., 19384.,  5659., 13042., 14578.,  2813.,
       17452.,   888.,  6206.,  6959., 14540.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.],
      dtype=float32)], dtype=object)

My question is: What is wrong with Title? What should I change ?

I am assuming that is related to the data type of the numpy.ndarray containing each numpy.ndarray title. As it be can seen above dtype=object. But I am not really sure.

Thank you in advance!

Edit:

I found a work around to this issue by simply transforming the dataset to a Numpy ndarray.

# To numpy
numpy_dataset = data.to_numpy(dtype="<U43")

#Get Target
target = data.pop("Target")

#TF dataset
dataset = tf.data.Dataset.from_tensor_slices((numpy_dataset, target.values))
3
  • 2
    Each cell of the Title column is an array. values is then an array of arrays. Try np.stack(data["Title"].values). If it raises an error, those nested arrays differ in shape, and cannot be made into a 2d numeric array (which tensorflow can use). Commented Jan 13, 2021 at 20:53
  • Great that solved my problem but partially. As you can see in the code above I pass the dataframe not only Titles. If I do what you suggested, tf.data.Dataset.from_tensor_slices((np.stack(data["Title"].values), target.values)) the TensorFlow dataset is created. But how can I include the remaining columns? Commented Jan 13, 2021 at 21:05
  • Other answers here : stackoverflow.com/questions/58636087/… Commented Jan 16, 2023 at 20:24

2 Answers 2

1

I found a work around to this issue by simply transforming the dataset to a Numpy ndarray.

# To numpy
numpy_dataset = data.to_numpy(dtype="<U43")

#Get Target
target = data.pop("Target")

#TF dataset
dataset = tf.data.Dataset.from_tensor_slices((numpy_dataset, target.values))
Sign up to request clarification or add additional context in comments.

Comments

1

I meet the same question when I try the demo of tf feature_columns.ipynb. I found the data contain null data, after drop them, the code worked

    #drop null data
     dataframe = dataframe.dropna(axis=0, how='any')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.