2

I'm using tf.data.experimental.make_csv_dataset to create a dataset from a .csv file. I'm also using tf.keras.layers.DenseFeatures as an input layer of my model.

I'm struggling to create a DenseFeatures layer properly so that it is compatible with my dataset in the case when batch_size parameter of make_csv_dataset is not equal to 1 (in case if batch_size=1 my setup works as expected).

I create DenseFeatures layer using a list of tf.feature_column.numeric_column elements with shape=(my_batch_size,), but it seems like in this case for some reason the input layer expects [my_batch_size,my_batch_size] shape instead of [my_batch_size,1].

With my_batch_size=19 I'm getting the following error when trying to fit the model:

ValueError: Cannot reshape a tensor with 19 elements to shape [19,19] (361 elements) for 'MyModel/Input/MyColumn1/Reshape' (op: 'Reshape') with input shapes: [19,1], [2] and with input
tensors computed as partial shapes: input[1] = [19,19].

If I don't specify shape when creating numeric_column it doesn't work either. I'm getting the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  The second input must be a scalar, but it has shape [19]

which assumes that numeric_column expects a scalar but recieves the whole batch in one Tensor.

How do I create an input layer of DenseFeatures so that it accepts the dataset produced by make_csv_dataset(batch_size=my_batch_size)?

1 Answer 1

1

From the tf.feature_column.numeric_column documentation:

shape: An iterable of integers specifies the shape of the Tensor. An integer can be given which means a single dimension Tensor with given width. The Tensor representing the column will have the shape of [batch_size] + shape.

This means that you must not pass the batch size to the shape argument: shape=().

Currently, with a batch size of 1, you get shape=(1,) that TF can handle thanks to broadcasting or something like that (dimensions of size 1 are easily added by TF if necessary), that's why it works.

Hope this can help. Provide more code if you want more help.

Sign up to request clarification or add additional context in comments.

5 Comments

If I don't specify shape it doesn't work either. I'm getting the following error: tensorflow.python.framework.errors_impl.InvalidArgumentError: The second input must be a scalar, but it has shape [19] which assumes that numeric_column is expecting a scalar but receiving the whole batch as one Tensor. So, I'm still not sure how to use numeric_column together with the batched CsvDataset.
@VolodymyrFrolov You have to specify the shape argument, but giving an empty list or tuple: shape=() which is not the same that not specifying it, falling back on the default value of (1,). Else, you have to resize manually your samples in a way that your batch has shape [19, 1] and not just [19]. Moreover, please provide a minimal example to allow us to better help you.
Specifying shape=() helped, additional resizing [19] to [19,1] was not needed. I also had a problem with my custom loss function where it was casting input tensor to a scalar. I found the problem by putting together a minimal example. Thanks for your help
@VolodymyrFrolov Additional resizing is only required if you want to let the default shape value. The matchings between the shape argument and your batch shape is as follow: shape=() <=> batch.shape=[19] or shape=(1,) <=> batch.shape=[19,1]. My advice is to use tensors of size 1 instead of scalars in your models; this will more often match what is expected by TF API (as it is the case here with the default value of the shape argument). Happy to help !
Understood. I agree that using scalars is not a good practice, so I'll rewrite my code to use tensors instead and to be shape-agnostic.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.