Kira's Web-Treehouse for Plants ๐ŸŒฑ

& Other Wayward Beings

๐ŸŽ™๏ธ DIY Speech-to-Text

A fully local speech-to-text hack!

This rigs it up so that while a pre-determined keyboard key is held, audio is recorded, and when it is released the audio is transcribed to text and inserted into whatever window or field has focus.

0. System dependencies

1. whisper.cpp

Follow steps to get whisper.cpp set up, and make sure the example works:

https://github.com/ggerganov/whisper.cpp#quick-start

The rest of the write-up assumes whisper.cpp was installed at $WHISPER.

2. global-keypress

Clone this repo, which will detect keypresses globally.

$ git clone https://github.com/miguelmota/global-keypress

$ cd global-keypress

Since it would be dangerous to log all keypresses, patch it to only log the key we're interested in:

$ cat > key.patch
diff --git a/src/linux/globalkeypress.c b/src/linux/globalkeypress.c
index 39dc13f..ed8f7ee 100644
--- a/src/linux/globalkeypress.c
+++ b/src/linux/globalkeypress.c
@@ -83,13 +83,6 @@ int main(int argc, char **argv) {
    // We want to write to the file on every keypress, so disable buffering
    setbuf(logfile, NULL);

-   // Daemonize process. Don't change working directory but redirect standard
-   // inputs and outputs to /dev/null
-   if (daemon(1, 0) == -1) {
-     LOG_ERROR("%s", strerror(errno));
-     exit(-1);
-   }
-
    uint8_t shift_pressed = 0;
    input_event event;
    while (read(kbd_fd, &event, sizeof(input_event)) > 0) {
@@ -99,15 +92,17 @@ int main(int argc, char **argv) {
                shift_pressed++;
             }
             char *name = getKeyText(event.code, shift_pressed);
-            if (strcmp(name, UNKNOWN_KEY) != 0) {
-              //LOG("%s", name);
-              fputs(name, logfile);
-              fputs("\n", logfile);
+            if (!strcmp(name, "<SysRq>")) {
+              fputs("pressed\n", logfile);
             }
          } else if (event.value == KEY_RELEASE) {
             if (isShift(event.code)) {
                shift_pressed--;
             }
+            char *name = getKeyText(event.code, shift_pressed);
+            if (!strcmp(name, "<SysRq>")) {
+              fputs("released\n", logfile);
+            }
          }
       }
       assert(shift_pressed >= 0 && shift_pressed <= 2);
^D

$ git apply key.patch

$ ./compile

You can change "<SysRq>" (print-screen on my laptop) to whatever key you want.

The rest of the write-up assumes global-keypress was installed at $GKEYPRESS.

4. Putting it all together

$ cat > whisper.sh
#!/bin/bash

AUDIO="/tmp/whisper.wav"
WHISPER="$HOME/..."         # your path here
GKEYPRESS="$HOME/..."       # your path here

onkill() {
  sudo pkill globalkeypress
}

echo "" | sudo tee /var/log/globalkeypress.log
sudo $GKEYPRESS/bin/globalkeypress &

trap onkill SIGINT

tail -f -n0 /var/log/globalkeypress.log | \
while read -r line; do
  if [[ $line == "pressed" ]]; then
    # could play a lil sound effect here to indicate recording started
    # e.g. play start.wav
    rec -r 16k $AUDIO 2> /dev/null &
  elif [[ $line == "released" ]]; then
    # could play a lil sound effect here to indicate recording stopped
    # e.g. play done.wav
    pkill rec
    echo "stopped recording"
    sleep 0.25
    echo "transcribing"

    OUTPUT="$($WHISPER/build/bin/whisper-cli \
      -f /tmp/whisper.wav \
      -m $WHISPER/models/ggml-base.en.bin \
      2> /dev/null \
      | sed 's/^[.*\] *//' \
      | grep -v '^$')"

    # N.B. Linux-specific
    xdotool type "$OUTPUT"
  else
    echo "bad input"
    exit 1
  fi
done
^D

$ chmod +x whisper.sh

I'm not sure what the equivalent to xdotool would be on Mac OS. Perhaps osxdotool? If you know, please let me know and I'll update this page!

5. Run it & enjoy :)

Now just run it, and when you release the hotkey it will be transcribed and typed into whatever field has focus:

$ ./whisper.sh

โœจ โœจ โœจ

Send comments, questions, and fixes to kira SNAIL eight45 DOT net! ๐ŸŒ

See Also