I am presently trying to feed a dataset with a few multivalent feature columns through a TensorFlow Extended (TFX) pipeline. Here is a row from my sample data:
user_id 29601
product_id 28
touched_product_id [2435, 28]
liked_product_id [2435, 28]
disliked_product_id []
target 1
As you can see, the columns (features) touched_product_id, liked_product_id, disliked_product_id are multivalent.
Now, in order to feed this data through TFX's validation layer, I'm following the guide below:
https://www.tensorflow.org/tfx/tutorials/tfx/components_keras
In accordance with the guide, I produce some TFRecord files using an instance of CSVExampleGen, and proceed to generate statistics and schema as evinced below:
# create train and eval records
c = CsvExampleGen(input_base='sample_train')
context.run(c)
# generate statistics
statistics_gen = StatisticsGen(
examples=c.outputs['examples']
)
context.run(statistics_gen)
# generate schema
schema_gen = SchemaGen(
statistics=statistics_gen.outputs['statistics'],
infer_feature_shape=False)
context.run(schema_gen)
context.show(schema_gen.outputs['schema'])
The final schema displayed by the above code is:
Type Presence Valency Domain
Feature name
'disliked_product_id' BYTES required single -
'liked_product_id' BYTES required single -
'product_id' INT required single -
'target' INT required single -
'touched_product_id' BYTES required single -
'user_id' INT required single -
Clearly, the multivalent features are incorrectly inferred to be univalent. In an attempt to fix this, I loaded up the Schema proto manually and tried to adjust a valence property.
schema_path = os.path.join(schema_gen.outputs['schema'].get()[0].uri, 'schema.pbtxt')
schema = schema_pb2.Schema()
contents = file_io.read_file_to_string(schema_path)
schema = text_format.Parse(contents, schema)
# THIS LINE DOES NOT WORK
tfdv.get_feature(schema, 'user_id').valence = 'multiple'
Clearly, that final line does not work because to my surprise, there is no valence property. I tried looking into the spec for the Schema proto but did not find a valence property. Anyone know how I can solve this?