1

Numpy Array of tensorflow.keras.preprocessing.text.Tokenizer.texts_to_sequences is giving weird output for Training Labels as shown below:

(training_label_list[0:10]) = [list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1]) list([1])]

but is printing Normal Array for the Validation Labels,

(validation_label_list[0:10]) = [[16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]]

In other words, type(training_label_list[0]) = <class 'list'> but

type(validation_label_list[0]) =  <class 'numpy.ndarray'>

Consequently, while Training the Model using Keras Model.fit, it is resulting in the below Error,

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

This is the Link of the Google Colab, to reproduce the error easily.

Complete Code to reproduce the Error is given below:

!pip install tensorflow==2.1

# For Preprocessing the Text => To Tokenize the Text
from tensorflow.keras.preprocessing.text import Tokenizer
# If the Two Articles are of different length, pad_sequences will make the length equal
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Package for performing Numerical Operations
import numpy as np

Unique_Labels_List = ['India', 'USA', 'Australia', 'Germany', 'Bhutan', 'Nepal', 'New Zealand', 'Israel', 'Canada', 'France', 'Ireland', 'Poland', 'Egypt', 'Greece', 'China', 'Spain', 'Mexico']


Train_Labels = Unique_Labels_List[0:14]
#print('Train Labels = {}'.format(Train_Labels))

Val_Labels =  Unique_Labels_List[14:]
#print('Val_Labels = {}'.format(Val_Labels))

No_Of_Train_Items = [248, 200, 200, 218, 248, 248, 249, 247, 220, 200, 200, 211, 224, 209]
No_Val_Items = [212, 200, 219]

T_L = []
for Each_Label, Item in zip(Train_Labels, No_Of_Train_Items):
    T_L.append([Each_Label] * Item)

T_L = [item for sublist in T_L for item in sublist]

V_L = []
for Each_Label, Item in zip(Val_Labels, No_Val_Items):
    V_L.append([Each_Label] * Item)

V_L = [item for sublist in V_L for item in sublist]


len(T_L)

len(V_L)

label_tokenizer = Tokenizer()

label_tokenizer.fit_on_texts(Unique_Labels_List)

# Since it should be a Numpy Array, we should Convert the Sequences to Numpy Array, for both Training and 
# Test Labels

training_label_list = np.array(label_tokenizer.texts_to_sequences(T_L))

validation_label_list = np.array(label_tokenizer.texts_to_sequences(V_L))

print('(training_label_list[0:10]) = {}'.format((training_label_list[0:10])))
print('(validation_label_list[0:10]) = {}'.format((validation_label_list[0:10])))

print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))

I will be Grateful if someone can suggest me how can I get both Training Labels and Validation Labels in same Format, as I have spent so much time on it.

2 Answers 2

1

Replacing np.array with np.hstack as mentioned in this Stack Overflow Answer has fixed that problem for me.

Now, the Correct Output is

(training_label_seq[0:10]) = [1 1 1 1 1 1 1 1 1 1]
(validation_label_seq[0:10]) = [16 16 16 16 16 16 16 16 16 16]
type(training_label_list[0]) =  <class 'numpy.int64'>
type(validation_label_seq[0]) =  <class 'numpy.int64'>

Link of the working code is in this Google Colab.

Mentioned below is the working code (just in case if the above link doesn't work):

!pip install tensorflow==2.1

# For Preprocessing the Text => To Tokenize the Text
from tensorflow.keras.preprocessing.text import Tokenizer
# If the Two Articles are of different length, pad_sequences will make the length equal
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Package for performing Numerical Operations
import numpy as np

Unique_Labels_List = ['India', 'USA', 'Australia', 'Germany', 'Bhutan', 'Nepal', 'New Zealand', 'Israel', 'Canada', 'France', 'Ireland', 'Poland', 'Egypt', 'Greece', 'China', 'Spain', 'Mexico']


Train_Labels = Unique_Labels_List[0:14]
#print('Train Labels = {}'.format(Train_Labels))

Val_Labels =  Unique_Labels_List[14:]
#print('Val_Labels = {}'.format(Val_Labels))

No_Of_Train_Items = [248, 200, 200, 218, 248, 248, 249, 247, 220, 200, 200, 211, 224, 209]
No_Val_Items = [212, 200, 219]

T_L = []
for Each_Label, Item in zip(Train_Labels, No_Of_Train_Items):
    T_L.append([Each_Label] * Item)

T_L = [item for sublist in T_L for item in sublist]

V_L = []
for Each_Label, Item in zip(Val_Labels, No_Val_Items):
    V_L.append([Each_Label] * Item)

V_L = [item for sublist in V_L for item in sublist]


len(T_L)

len(V_L)

label_tokenizer = Tokenizer()

label_tokenizer.fit_on_texts(Unique_Labels_List)

# Since it should be a Numpy Array, we should Convert the Sequences to Numpy Array, for both Training and 
# Test Labels

training_label_list = np.hstack(label_tokenizer.texts_to_sequences(T_L))

validation_label_list = np.hstack(label_tokenizer.texts_to_sequences(V_L))

print('(training_label_list[0:10]) = {}'.format((training_label_list[0:10])))
print('(validation_label_list[0:10]) = {}'.format((validation_label_list[0:10])))

print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))
Sign up to request clarification or add additional context in comments.

Comments

0

Your problem is that, while your are converting your training data to a numpy array, that specific numpy array consists of list elements, hence the error

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

The error is subtler than it appears; some have reported that they had to switch back from 2.1.0 to 2.0.0. What is the difference between Numpy's array() and asarray() functions?

I would personally try this:

  1. Use training_label_list = np.asarray(label_tokenizer.texts_to_sequences(T_L)), instead of np.array. Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)
  2. According to this:

List of lists into numpy array

you will have to force the casting(although weird yet this should work):

x=[[1,2],[1,2,3],[1]]
y=numpy.array([numpy.array(xi) for xi in x])
type(y)
>>><type 'numpy.ndarray'>
type(y[0])
>>><type 'numpy.ndarray'>

While trying to help you on this issue, I discovered an interesting fact about numpy casting:

CASE 1:

   my_list = [[1,2],[2],[3]]
   my_numpy_array = np.array(my_list)
   print(type(my_numpy_array))
   print(type(my_numpy_array[0]))
   <class 'numpy.ndarray'>
   <class 'list'>

CASE 2:

    my_list = [[1],[2],[3]]
    my_numpy_array = np.array(my_list)
    print(type(my_numpy_array))
    print(type(my_numpy_array[0]))
    <class 'numpy.ndarray'>
    <class 'numpy.ndarray'>

Short conclusion: If the sublists lengths differ, apparently they are left as lists and not converted to numpy arrays.

I tested on your code, now it works:

training_label_seq = np.asarray(label_tokenizer.texts_to_sequences(T_L))

training_label_seq = np.array([np.array(training_element) for training_element in training_label_seq])

validation_label_seq = np.asarray(label_tokenizer.texts_to_sequences(V_L))



print('(training_label_seq[0:10]) = {}'.format((training_label_seq[0:10])))
print('(validation_label_seq[0:10]) = {}'.format((validation_label_seq[0:10])))

print('type(training_label_list[0]) = ', type(training_label_seq[0]))
print('type(validation_label_seq[0]) = ', type(validation_label_seq[0]))



(training_label_seq[0:10]) = [array([1]) array([1]) array([1]) array([1]) array([1]) array([1])
 array([1]) array([1]) array([1]) array([1])]
(validation_label_seq[0:10]) = [[16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]
 [16]]
type(training_label_list[0]) =  <class 'numpy.ndarray'>
type(validation_label_seq[0]) =  <class 'numpy.ndarray'>

3 Comments

Thank you for the quick response. I've tried both, np.asarray and Downgrading it to TF 2.0. No luck. Surprisingly, both Training and Testing Data are the Array of Lists but only Training Data is behaving weirdly.
Yes, I have also tried these two on the colab you provided. I am updating my answer with another possible response(check the number 3)
Still it doesn't work because, instead of list', we are getting array now for Training Labels but normal data for Testing Labels. Now the error is ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.