How to implement audio streaming in React + Node using the OpenAI TTS model

Question

I am going to implement Audio Streaming using OpenAI TTS model. It will get audio data from OpenAI TTS model as streaming and send it to frontend via WebSocket. It plays on the Frontend. Frontend is React.js and backend is Node.js.

This is frontend code.

audio_response = await openai.audio.speech.create({
  model: "tts-1",
  voice: "nova",
  input,
  response_format: "mp3",
});

// Get audio chunks from the stream and send via websocket
const stream = audio_response.body;

// Pipe the audio stream to the WebSocket in small chunks
stream.on("data", (chunk) => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(chunk); // Send audio data as binary chunks
  }
});

And this is backend:

const socket = new WebSocket(...);
socket.binaryType = "blob";

// Web Audio API setup
let audioContext;
let source;
let audioBufferQueue = []; // Queue for audio chunks

socket.addEventListener("message", async (event) => {
  const audioChunk = event.data;
  audioBufferQueue.push(audioChunk);

  // Start playing audio if not already playing
  if (!source) {
    await playAudioQueue();
  }
});

async function playAudioQueue() {
  if (!audioContext) {
    audioContext = new (window.AudioContext || window.webkitAudioContext)();
  }

  while (audioBufferQueue.length > 0) {
    const audioChunk = audioBufferQueue.shift();

    // Decode audio data
    const arrayBuffer = await audioChunk.arrayBuffer();
    try {
      const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

      // Play the audio buffer
      source = audioContext.createBufferSource();
      source.buffer = audioBuffer;
      source.connect(audioContext.destination);

      // Wait for the audio to finish playing
      await new Promise((resolve) => {
        source.onended = resolve;
        source.start();
      });

      source = null;
    } catch (err) {
      console.error("Error decoding audio data:", err);
    }
  }
}

Now this code has error like:

Error decoding audio data: EncodingError: Unable to decode audio data

Pankaj Jarial · Accepted Answer · 2024-12-26 05:15:00Z

0

I guess the code above one is backend and lower one is front-end. Are you able to get the response in the streaming from the openai? Bcz in python function call look like this:

    def aduioTextStream(text):
        with client.audio.speech.with_streaming_response.create(
            model="tts-1", voice="alloy", input=text, response_format="pcm"
        ) as response:
            for chunk in response.iter_bytes(chunk_size=1024):
                yield chunk

And yes I am facing the same issue in front-end side. the chunks are streaming to front-end but when make it audible and decoding it the content-type show the "octet-stream". I am backend dev. and i don't have enough knowledge in the front-end. Let me know it you know the answer how i can handle it.

edited Dec 26, 2024 at 5:15

answered Dec 26, 2024 at 5:11

Pankaj Jarial

12 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to implement audio streaming in React + Node using the OpenAI TTS model

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related