GitXplorerGitXplorer
j

whisperosx

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
3016e74fbc9bbe4ded6524bfea93dc85c16cef12

Fixing up the README a bit

jjhallard committed a year ago
Unverified
6a4fdd959ccc89c51bbbf0607ddce13018def45d

Fiddling with some of the experiments

jjhallard committed a year ago
Unverified
84409ad680382d59f4bcc986bf5e0529fc55de3b

Updating scripts to better handle signals

jjhallard committed a year ago
Unverified
1719f3f502c65a9ac0f83eda03ee11adadfe2be9

Inintial experiments complete, it works

jjhallard committed a year ago
Verified
0f1530624809c261c825c93b525d9c0e4761f633

Initial commit

jjhallard committed a year ago

README

The README file for this repository.

WhisperOSX

A simple collection of scripts that provide speech-to-text functionality across the desktop. Press a hotkey to start recording, press the hotkey again to stop recording, and have your speech rendered speech rendered to high-quality text quickly and effortlessly.

Status

I wrote this project in an hour or so in an attempt to experiment with the OpenAI API and some OSX automation APIs in my free time.

This project almost kinda works, but it's not great. I have it set up such that I can trigger recording with a hotkey combination (Ctrl-Shift-7) and can trigger stop-recording with the same combination, the resulting audio clip is then sent to the OpenAI Whisper API where it's transcribed, and the transcription is then written through a virtual keyboard to whatever program I currently have open.

However, OSX's protections mean that any program you have open when you run that automation needs both microphone access and Accessibility API access, and providing those can be a pain. It's also a bit slow to begin recording, so it's not clear when you should start talking after pressing the hotkeys.

I'm basically giving up on this for now, but it does mostly work.

Quickstart

Enter instructions here for building and running

These scripts require:

Brew

  • portaudio: $ brew install portaudio (required for pyaudio)

Python Modules

  • pyautogui: to create keystokes in the GUI from a string of text
  • openai: to send audio to the OpenAI Whipser API
  • pyaudio: for capturing audio from the active microphone
  • openai>=0.27.0: transcribing audio to text
  • psutil: finding an actively running process

Why?

With the release of Whipser, speech-to-text is good enough that it should be ubiquitous. However, most users of the Apple ecosystem are still forced to suffer through Siri's crappy speech-to-text functionality. This is unacceptable.

I created these scripts so that I could have access to world-class speech-to-text functionality anywhere on my desktop at anytime.

Future Work

Ideally this would run entirely locally using the WhisperCpp project. However, at the time I created this project I was running on Intel silicon and inference is too slow to reasonably make use of.