Graybeard
Well-Known Member
I started off this morning trying to use the online dictation application and it was broken.
So I got really pissed and I said, hey, this is just not working.
So I need to make my own voice to text and this is what I did:
First, I explained my problem to OpenAI's chat GPT and we delved into the possibilities.
I'm going to include a link to share the chat we had because me does it was quite extensive.
In the end, we came up with a way that worked, took some time.
But what we have is a self-hosted API that takes dictation and then translates the dictation into text
with a Python module.
(the above text is literal without any grammar or syntax editing --natural speech ...)
This is what it just did:
I can't share the link for reason the chat has an uploaded image but here is the final code:
So, this was developed in a hour or so start to finish
So I got really pissed and I said, hey, this is just not working.
So I need to make my own voice to text and this is what I did:
First, I explained my problem to OpenAI's chat GPT and we delved into the possibilities.
In the end, we came up with a way that worked, took some time.
But what we have is a self-hosted API that takes dictation and then translates the dictation into text
with a Python module.
(the above text is literal without any grammar or syntax editing --natural speech ...)
This is what it just did:
Bash:
$ ./voice-to-text.sh
Enter the base name for your audio file (e.g., 'test'):
./voice-to-text-api-02
Recording audio as ./voice-to-text-api-02.mp3...
Recording WAVE './voice-to-text-api-02.mp3' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono
Transcribing audio to ./voice-to-text-api-02.txt...
Here is the transcription from ./voice-to-text-api-02.txt:
First, I explained my problem to OpenAI's chat GPT and we delved into the possibilities.
I'm going to include a link to share the chat we had because me does it was quite extensive.
In the end, we came up with a way that worked, took some time.
But what we have is a self-hosted API that takes dictation and then translates the dictation into text
with a Python module.
I can't share the link for reason the chat has an uploaded image but here is the final code:
Bash:
#!/bin/bash
# Prompt for the base name of the file
echo "Enter the base name for your audio file (e.g., 'test'):"
read name
# Check if the name is not empty
if [ -z "$name" ]; then
echo "You must enter a valid name!"
exit 1
fi
# Record audio using arecord
echo "Recording audio as ${name}.mp3..."
arecord -D hw:1,0 -f S16_LE -r 44100 -c 1 -d 60 ${name}.mp3
# Transcribe the audio with Whisper and save the result
echo "Transcribing audio to ${name}.txt..."
whisper ${name}.mp3 --model tiny --language en 2>/dev/null | grep '\]' | sed 's/.*\]//' >${name}.txt
# Display the transcription
echo "Here is the transcription from ${name}.txt:"
cat ${name}.txt
So, this was developed in a hour or so start to finish
Last edited: