About 5 watts at the wall. A purely self-hosted live transcription of a Mumble audio chat. Transcription annotated by the speaker's name.


Also, it goes without saying, but its ridiculous that if I want this capability, the only way to get it is to bootleg it like this.

Google finally achieved a technology that can transcribe spoken conversation, but they want to hoard it behind proprietary APIs and services. This bootleg is a small glimpse at what technology could be like if its goal was to provide utility instead of just make money: Accessibility technology that actually works!

Ok, ok, maybe that's a bit of a grandiose claim for something like this which is just barely demo-able and still has tons of difficult fundamental problems to overcome... But the point is, this capability would probably be commonplace already if it was open technology. The only reason this is remarkable is because I went through the effort to hack it together with DACs on both sides, special audio cables, android UI testing libraries, and heaps of good old-fashioned software duct-tape. And despite all that, it still out-performs the "Open" state of the art like "OpenAI" Whisper, while using 1/10th of the energy.

Sign in to participate in the conversation

Small server part of the pixie.town infrastructure. Registration is closed.