Background
I am using Tensorflow for the first time following a tutorial on featurization with the new Google Recommenders package: https://www.tensorflow.org/recommenders/examples/featurization
I ran into trouble swapping out their dataset (MovieLens) for one based on the Kaggle wine data. The following code works as expected:
wine_title_lookup= tf.keras.layers.experimental.preprocessing.StringLookup()
wine_title_lookup.adapt(np.unique(wine_train['title']))
print(f"Vocabulary: {wine_title_lookup.get_vocabulary()[:3]}")
Vocabulary: ['', '[UNK]', 'Žitavské Vinice Rhine Riesling']
wine_title_embedding = tf.keras.layers.Embedding(
# Let's use the explicit vocabulary lookup.
input_dim=wine_title_lookup.vocab_size(),
output_dim=32
)
x= wine_title_lookup(["Susana Balbo Signature Malbec"])
x= wine_title_embedding(x)
x
<tf.Tensor: shape=(1, 32), dtype=float32, numpy= array([[-0.03861505, -0.02146437, 0.04332292, -0.02598745, 0.03842534, -0.01066433, 0.0292404 , 0.02783312, 0.03364438, 0.00054752, -0.0295071 , 0.03200008, 0.01224083, -0.00100452, -0.04346857, 0.00105418, -0.01640136, -0.01778026, 0.00171928, 0.03215903, 0.00020416, -0.02083766, -0.00323264, 0.02582215, 0.04805436, 0.0325211 , 0.0100181 , -0.04965406, 0.02548517, 0.01569786, 0.03761304, 0.01659941]], dtype=float32)>
However the following produces an error
wine_title_model = tf.keras.Sequential([wine_title_lookup, wine_title_embedding])
wine_title_model(["Susana Balbo Signature Malbec"])
AttributeError Traceback (most recent call last) in () ----> 1 wine_title_model(["Susana Balbo Signature Malbec"])
3 frames /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs) 983 984 with ops.enable_auto_cast_variables(self._compute_dtype_object): --> 985 outputs = call_fn(inputs, *args, **kwargs) 986 987 if self._activity_regularizer:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/sequential.py in call(self, inputs, training, mask) 370 if not self.built: 371 self._init_graph_network(self.inputs, self.outputs) --> 372 return super(Sequential, self).call(inputs, training=training, mask=mask) 373 374 outputs = inputs # handle the corner case where self.layers is empty
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py in call(self, inputs, training, mask) 384 """ 385 return self._run_internal_graph( --> 386 inputs, training=training, mask=mask) 387 388 def compute_output_shape(self, input_shape):
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py in _run_internal_graph(self, inputs, training, mask) 482 masks = self._flatten_to_reference_inputs(mask) 483 for input_t, mask in zip(inputs, masks): --> 484 input_t._keras_mask = mask 485 486 # Dictionary mapping reference tensors to computed tensors.
AttributeError: 'str' object has no attribute '_keras_mask'
Notable differences with the source material
The Google code I based my script on uses a data format I am unfamiliar with which allows them to run map on their data. I tried converting my data into some tensorflow formats but could not seem to replicate their functionality. However this is the only step that is different and I cannot understand why the pieces of the Sequence op work individually but not as a whole.
I looked at some other examples from when this error has popped up on SO but could not find a solution to my problem. This what the raw data looks like.
wine_train.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 108655 entries, 0 to 120727
Data columns (total 16 columns):
Column Non-Null Count Dtype
--- ------ -------------- -----
0 country 108600 non-null object
1 description 108652 non-null object
2 designation 77150 non-null object
3 points 108336 non-null float64
4 price 100871 non-null float64
5 province 108600 non-null object
6 region_1 108655 non-null object
7 region_2 42442 non-null object
8 title 108655 non-null object
9 variety 108655 non-null object
10 winery 108655 non-null object
11 designation_replace 108655 non-null object
12 user_id 108655 non-null int64
13 price_isna 108655 non-null bool
14 price_imputed 108650 non-null float64
15 wine_id 108655 non-null int64
dtypes: bool(1), float64(3), int64(2), object(10)
memory usage: 13.4+ MB