workvoice
On-device Voice AI · Savant
Local GPT-4o-style conversation, no cloud
Six months at Savant building a Voice AI model end-to-end. The constraint: it had to run on-device — no cloud calls, no GPT-4o at runtime.
I designed a multi-agent system that intelligently routes user queries to specialized lightweight SLM agents, with OpenAI Whisper handling speech-to-text and a custom model2vec embedding classifier doing the routing. A RAG-backed vector DB carries chat history for context.
Highlights
- Independently spearheaded end-to-end development.
- Multi-agent architecture managing 60+ complex tool calls.
- Whisper STT + custom model2vec embeddings + classifier router.
- Vector DB w/ RAG for chat-history context.
- Finetuned CSMs to up to 95% accuracy per use case with natural speech.