workvoice
2025 · Voice · SLM · Whisper · RAG

On-device Voice AI · Savant

Local GPT-4o-style conversation, no cloud

Six months at Savant building a Voice AI model end-to-end. The constraint: it had to run on-device — no cloud calls, no GPT-4o at runtime.

I designed a multi-agent system that intelligently routes user queries to specialized lightweight SLM agents, with OpenAI Whisper handling speech-to-text and a custom model2vec embedding classifier doing the routing. A RAG-backed vector DB carries chat history for context.

Highlights

  • Independently spearheaded end-to-end development.
  • Multi-agent architecture managing 60+ complex tool calls.
  • Whisper STT + custom model2vec embeddings + classifier router.
  • Vector DB w/ RAG for chat-history context.
  • Finetuned CSMs to up to 95% accuracy per use case with natural speech.