0

I am working on a binary classification task using an audio dataset, which is already divided into training and testing sets. However, I also need a validation set, so I split the training set into training and validation subsets.

I have created a custom PyTorch Dataset class (CustomAudioDataset) that takes a transform argument, which is a list of Compose objects. The first transformation applies augmentations to raw audio, while the second one applies augmentations to the Mel spectrogram:

audio_transforms = T.Compose([
    T.RandomApply([AddGaussianNoise()], p=0.5),
    T.RandomApply([TimeShift()], p=0.5),
    T.RandomApply([PitchShift(SAMPLE_RATE)], p=0.5)
])

mel_transforms = T.Compose([
    T.RandomApply([FrequencyMasking()], p=0.5),
    T.RandomApply([TimeMasking()], p=0.5),
    T.RandomApply([TimeStretch()], p=0.5),
    T.Normalize(mean=[0], std=[1])
])

Here’s a simplified version of my CustomAudioDataset class:

class CustomAudioDataset(Dataset):
    def __init__(self, parent_directory, transform=None):
        self.parent_directory = parent_directory
        self.transform = transform
        self.audio_files = []
        self.labels = []

        for label in ['0', '1']:
            directory = os.path.join(parent_directory, label)
            for file_name in os.listdir(directory):
                if file_name.endswith('.wav'):
                    self.audio_files.append(os.path.join(directory, file_name))
                    self.labels.append(int(label))

    def __len__(self):
        return len(self.audio_files)

    def __getitem__(self, idx):
        audio_path = self.audio_files[idx]
        label = self.labels[idx]
        audio, sr = torchaudio.load(audio_path)

        if self.transform and len(self.transform) > 0:
            audio = self.transform[0](audio)  # Apply raw audio augmentation
        
        audio = self.pad_audio(audio)
        features = self.extract_features(audio)
        
        if self.transform and len(self.transform) > 1:
            features = self.transform[1](features)  # Apply Mel spectrogram augmentation

        return features, label

To create the train-validation split, I do the following:

train_val_dataset = CustomAudioDataset(
    parent_directory="...", 
    transform=[audio_transforms, mel_transforms]
)

train_size = int(0.75 * len(train_val_dataset))
val_size = len(train_val_dataset) - train_size

train_dataset, val_dataset = random_split(train_val_dataset, [train_size, val_size])
val_dataset.dataset.transform = []  # Disable transformations for validation

My Question:
When I set val_dataset.dataset.transform = [] to disable transformations for the validation dataset, does this work or not? If this works then does it affect only the validation dataset, or does it also impact the training dataset (train_dataset)?

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.