0

My project is a voice-controlled email website. The user has to speak through the browser mic to give in commands. The input audio is expected to not be stored as a file, but instead directly streamed to HuggingFace's Whisper model Inference API. This model will convert the speech to text, so that further processing can be done. I'll provide the Inference API JavaScript code below, but I think it expects a file to read instead of a stream. So, I need help modifying this code as well:

async function query(filename) {
    const data = fs.readFileSync(filename);
    const response = await fetch(
        "https://api-inference.huggingface.co/models/openai/whisper-medium",
        {
            headers: { Authorization: "Bearer ...." },
            method: "POST",
            body: data,
        }
    );
    const result = await response.json();
    return result;
}

query("sample1.flac").then((response) => {
    console.log(JSON.stringify(response));
});

So keeping in mind that the audio is to be streamed, how do I record the user's input from the browser and stream it to HuggingFace?

As of now, I only found the following article the most likely solution: Building a client-side web app which streams audio from a browser microphone to a server. (Part II) But this article focuses on the client sending the audio to an intermediate server which was also separately built, and then the server using API calls to Dialogflow.

I need the same functionality, but without the intermediate server and streaming the audio directly to existing server, via HuggingFace's API call.

5
  • Flac would mean I have to save the audio as a file and upload it, followed by downloading on the server end. I'd like to avoid uploading and downloading files if possible, because that might delay the response from the server. Commented Feb 7, 2024 at 1:44
  • I assume you can keep the flac in memory and send it as a blob or something to HuggingFace? Commented Feb 7, 2024 at 6:42
  • stackoverflow.com/questions/73354800/… Commented Feb 17, 2024 at 0:46
  • stackoverflow.com/questions/51689270/… Commented Feb 17, 2024 at 0:48
  • you need a backend server anyway I dont*t understand your problem. Stream to your nodejs server save the file and send to hugging face. Whats the problem? Commented Feb 17, 2024 at 0:50

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.