Voice to Text api self-hosted

Graybeard · Dec 14, 2024

I started off this morning trying to use the online dictation application and it was broken.
So I got really pissed and I said, hey, this is just not working.
So I need to make my own voice to text and this is what I did:
First, I explained my problem to OpenAI's chat GPT and we delved into the possibilities.
~~I'm going to include a link to share the chat we had because me does it was quite extensive.~~
In the end, we came up with a way that worked, took some time.
But what we have is a self-hosted API that takes dictation and then translates the dictation into text
with a Python module.
(the above text is literal without any grammar or syntax editing --natural speech ...)

This is what it just did:

Bash:

$ ./voice-to-text.sh
Enter the base name for your audio file (e.g., 'test'):
./voice-to-text-api-02
Recording audio as ./voice-to-text-api-02.mp3...
Recording WAVE './voice-to-text-api-02.mp3' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono
Transcribing audio to ./voice-to-text-api-02.txt...
Here is the transcription from ./voice-to-text-api-02.txt:
First, I explained my problem to OpenAI's chat GPT and we delved into the possibilities.
I'm going to include a link to share the chat we had because me does it was quite extensive.
In the end, we came up with a way that worked, took some time.
But what we have is a self-hosted API that takes dictation and then translates the dictation into text
with a Python module.

I can't share the link for reason the chat has an uploaded image but here is the final code:

Bash:

#!/bin/bash

# Prompt for the base name of the file
echo "Enter the base name for your audio file (e.g., 'test'):"
read name

# Check if the name is not empty
if [ -z "$name" ]; then
  echo "You must enter a valid name!"
  exit 1
fi

# Record audio using arecord
echo "Recording audio as ${name}.mp3..."
arecord -D hw:1,0 -f S16_LE -r 44100 -c 1 -d 60 ${name}.mp3

# Transcribe the audio with Whisper and save the result
echo "Transcribing audio to ${name}.txt..."
whisper ${name}.mp3 --model tiny --language en 2>/dev/null | grep '\]' | sed 's/.*\]//' >${name}.txt

# Display the transcription
echo "Here is the transcription from ${name}.txt:"
cat ${name}.txt

So, this was developed in a hour or so start to finish

T J Tutor · Dec 14, 2024

Why is that better, or worse, than using something like Dragon Natural?

Graybeard · Dec 14, 2024

Because I made the software free it works on LINUX (big deal for me).
Seems to really require little or no training, just speak slowly and clearly.
Whisper connects direct to OpenAI (disadvantage in that there may be some record of the transcription).
The auto punctuation is really very good (so far).

The Most Active and Friendliest
Affiliate Marketing Community Online!

Voice to Text api self-hosted

Graybeard

Well-Known Member

T J Tutor

GM

Graybeard

Well-Known Member

Similar threads

The Most Active and Friendliest Affiliate Marketing Community Online!

Voice to Text api self-hosted

Graybeard

Well-Known Member

T J Tutor

GM

Graybeard

Well-Known Member

Similar threads

The Most Active and Friendliest
Affiliate Marketing Community Online!