๐๏ธ DIY Speech-to-Text
A fully local speech-to-text hack!
This rigs it up so that while a pre-determined keyboard key is held, audio is recorded, and when it is released the audio is transcribed to text and inserted into whatever window or field has focus.
0. System dependencies
- Linux or Mac OS
- a terminal program running something like bash
- gcc
- git
- sox
- xdotool (Linux)
1. whisper.cpp
Follow steps to get whisper.cpp set up, and make sure the example works:
https://github.com/ggerganov/whisper.cpp#quick-start
The rest of the write-up assumes whisper.cpp was installed at
$WHISPER
.
2. global-keypress
Clone this repo, which will detect keypresses globally.
$ git clone https://github.com/miguelmota/global-keypress $ cd global-keypress
Since it would be dangerous to log all keypresses, patch it to only log the key we're interested in:
$ cat > key.patch diff --git a/src/linux/globalkeypress.c b/src/linux/globalkeypress.c index 39dc13f..ed8f7ee 100644 --- a/src/linux/globalkeypress.c +++ b/src/linux/globalkeypress.c @@ -83,13 +83,6 @@ int main(int argc, char **argv) { // We want to write to the file on every keypress, so disable buffering setbuf(logfile, NULL); - // Daemonize process. Don't change working directory but redirect standard - // inputs and outputs to /dev/null - if (daemon(1, 0) == -1) { - LOG_ERROR("%s", strerror(errno)); - exit(-1); - } - uint8_t shift_pressed = 0; input_event event; while (read(kbd_fd, &event, sizeof(input_event)) > 0) { @@ -99,15 +92,17 @@ int main(int argc, char **argv) { shift_pressed++; } char *name = getKeyText(event.code, shift_pressed); - if (strcmp(name, UNKNOWN_KEY) != 0) { - //LOG("%s", name); - fputs(name, logfile); - fputs("\n", logfile); + if (!strcmp(name, "<SysRq>")) { + fputs("pressed\n", logfile); } } else if (event.value == KEY_RELEASE) { if (isShift(event.code)) { shift_pressed--; } + char *name = getKeyText(event.code, shift_pressed); + if (!strcmp(name, "<SysRq>")) { + fputs("released\n", logfile); + } } } assert(shift_pressed >= 0 && shift_pressed <= 2); ^D $ git apply key.patch $ ./compile
You can change "<SysRq>"
(print-screen on my laptop) to
whatever key you want.
The rest of the write-up assumes global-keypress was installed at $GKEYPRESS
.
4. Putting it all together
$ cat > whisper.sh #!/bin/bash AUDIO="/tmp/whisper.wav" WHISPER="$HOME/..." # your path here GKEYPRESS="$HOME/..." # your path here onkill() { sudo pkill globalkeypress } echo "" | sudo tee /var/log/globalkeypress.log sudo $GKEYPRESS/bin/globalkeypress & trap onkill SIGINT tail -f -n0 /var/log/globalkeypress.log | \ while read -r line; do if [[ $line == "pressed" ]]; then # could play a lil sound effect here to indicate recording started # e.g. play start.wav rec -r 16k $AUDIO 2> /dev/null & elif [[ $line == "released" ]]; then # could play a lil sound effect here to indicate recording stopped # e.g. play done.wav pkill rec echo "stopped recording" sleep 0.25 echo "transcribing" OUTPUT="$($WHISPER/build/bin/whisper-cli \ -f /tmp/whisper.wav \ -m $WHISPER/models/ggml-base.en.bin \ 2> /dev/null \ | sed 's/^[.*\] *//' \ | grep -v '^$')" # N.B. Linux-specific xdotool type "$OUTPUT" else echo "bad input" exit 1 fi done ^D $ chmod +x whisper.sh
I'm not sure what the equivalent to xdotool
would be on Mac OS. Perhaps
osxdotool? If you know, please let me know
and I'll update this page!
5. Run it & enjoy :)
Now just run it, and when you release the hotkey it will be transcribed and typed into whatever field has focus:
$ ./whisper.sh
โจ โจ โจ
Send comments, questions, and fixes to kira SNAIL eight45 DOT net
! ๐