3

I am using keras and have:

        corrupted_samples, corrupted_sample_rate = sf.read(
            self.corrupted_audio_file_paths[index])

        frequencies, times, spectrogram = scipy.signal.spectrogram(
            corrupted_samples, corrupted_sample_rate)

As per the docs, this gives:

f (ndarray) - Array of sample frequencies.
t (ndarray) - Array of segment times.
Sxx (ndarray) - Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

I assume all of the times will line up, so I don't care about the value of the time (I don't think). The same is true of frequencies. So what I actually need is the values at each time for each frequency, which is given by Sxx (or spectrogram) in my code. I'm unsure how to actually do that. It seems simple though.

1

1 Answer 1

2

Based on https://towardsdatascience.com/speech-recognition-analysis-f03ff9ce78e9, the author stated that the spectrogram is a spectro-temporal representation of the sound and show some of the steps of converting wav file to spectogram.

One of the example could be as below:

## Check the sampling rate of the WAV file.
audio_file = './siren_mfcc_demo.wav'


import wave
with wave.open(audio_file, "rb") as wave_file:
    sr = wave_file.getframerate()
print(sr)

audio_binary = tf.read_file(audio_file)

# tf.contrib.ffmpeg not supported on Windows, refer to issue
# https://github.com/tensorflow/tensorflow/issues/8271
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='wav', samples_per_second=sr, channel_count=1)
print(waveform.numpy().shape)

signals = tf.reshape(waveform, [1, -1])
signals.get_shape()

# Compute a [batch_size, ?, 128] tensor of fixed length, overlapping windows
# where each window overlaps the previous by 75% (frame_length - frame_step
# samples of overlap).
frames = tf.contrib.signal.frame(signals, frame_length=128, frame_step=32)
print(frames.numpy().shape)

# `magnitude_spectrograms` is a [batch_size, ?, 129] tensor of spectrograms. We
# would like to produce overlapping fixed-size spectrogram patches; for example,
# for use in a situation where a fixed size input is needed.
magnitude_spectrograms = tf.abs(tf.contrib.signal.stft(
    signals, frame_length=256, frame_step=64, fft_length=256))

print(magnitude_spectrograms.numpy().shape)

The method above is referring to https://colab.research.google.com/drive/1Adcy25HYC4c9uSBDK9q5_glR246m-TSx#scrollTo=QTa1BVSOw1Oe

Hope it can help you.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. I already have the spectogram from scipy.signal.spectrogram. I need to convert that to a tensor of (n_timesteps, n_frequencies) somehow
Did you ever find a solution for that @Shamoon ?
I am trying to find a solution like that. Did anyone solve it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.