1

I am learning how to use Tensorflow with a piece of old code provided at a workshop a couple of years back (i.e. it should've been tested and works). However it didn't work, and my investigations led me back to step 1 of loading the data and making sure it has loaded properly.

The data reading pipeline is as follows:

  1. The load function
@tf.function
def load(path_pair):
    image_path = path_pair[0]
    masks_path = path_pair[1]

    image_raw = tf.io.read_file(image_path)
    image = tf.io.decode_image(
        image_raw, channels=1, dtype=tf.uint8
    )

    masks_raw = tf.io.read_file(masks_path)
    masks = tf.io.decode_image(
        masks_raw, channels=NUM_CONTOURS, dtype=tf.uint8
    )

    return image / 255, masks / 255```
2. The function used to create the dataset
```def create_datasets(dataset_type):
    path_pairs = get_path_pairs(dataset_type) # this just gives a list of 2 x 2 tuples containing the image/mask path to load
    dataset = tf.data.Dataset.from_tensor_slices(path_pairs)
    dataset = dataset.shuffle(
        len(path_pairs),
        reshuffle_each_iteration=True,
    )
    dataset = dataset.map(load)

    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.AUTOTUNE)

    return dataset```



When I use the create_datasets function on a dataset that contains 818 data pairs and check the size of the loaded dataset using len(dataset)it tells me there is only 2 items loaded. 

1 Answer 1

0

The problem is, that you are batching your dataset, thus when you use len(dataset), you get the number of batches, not the number of elements in your dataset. To get them you can, for instance, iterate over your batches:

num_samples = 0
for batch in dataset:
    num_samples += len(batch[0])
print(num_samples)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you!!! I totally missed that part. I went back and used: for image, label in datasets['validation']: print("Image shape: ", image.numpy().shape) print("Label shape: ", label.numpy().shape) to check what the size of the dataset was and it was as you said, split into two batches
@Erin You're welcome, glad I could help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.