I am working on a binary classification task using an audio dataset, which is already divided into training and testing sets. However, I also need a validation set, so I split the training set into training and validation subsets.
I have created a custom PyTorch Dataset class (CustomAudioDataset) that takes a transform argument, which is a list of Compose objects. The first transformation applies augmentations to raw audio, while the second one applies augmentations to the Mel spectrogram:
audio_transforms = T.Compose([
T.RandomApply([AddGaussianNoise()], p=0.5),
T.RandomApply([TimeShift()], p=0.5),
T.RandomApply([PitchShift(SAMPLE_RATE)], p=0.5)
])
mel_transforms = T.Compose([
T.RandomApply([FrequencyMasking()], p=0.5),
T.RandomApply([TimeMasking()], p=0.5),
T.RandomApply([TimeStretch()], p=0.5),
T.Normalize(mean=[0], std=[1])
])
Here’s a simplified version of my CustomAudioDataset class:
class CustomAudioDataset(Dataset):
def __init__(self, parent_directory, transform=None):
self.parent_directory = parent_directory
self.transform = transform
self.audio_files = []
self.labels = []
for label in ['0', '1']:
directory = os.path.join(parent_directory, label)
for file_name in os.listdir(directory):
if file_name.endswith('.wav'):
self.audio_files.append(os.path.join(directory, file_name))
self.labels.append(int(label))
def __len__(self):
return len(self.audio_files)
def __getitem__(self, idx):
audio_path = self.audio_files[idx]
label = self.labels[idx]
audio, sr = torchaudio.load(audio_path)
if self.transform and len(self.transform) > 0:
audio = self.transform[0](audio) # Apply raw audio augmentation
audio = self.pad_audio(audio)
features = self.extract_features(audio)
if self.transform and len(self.transform) > 1:
features = self.transform[1](features) # Apply Mel spectrogram augmentation
return features, label
To create the train-validation split, I do the following:
train_val_dataset = CustomAudioDataset(
parent_directory="...",
transform=[audio_transforms, mel_transforms]
)
train_size = int(0.75 * len(train_val_dataset))
val_size = len(train_val_dataset) - train_size
train_dataset, val_dataset = random_split(train_val_dataset, [train_size, val_size])
val_dataset.dataset.transform = [] # Disable transformations for validation
My Question:
When I set val_dataset.dataset.transform = [] to disable transformations for the validation dataset, does this work or not? If this works then does it affect only the validation dataset, or does it also impact the training dataset (train_dataset)?