A voice agentfor anyHinglish workflow.
Sub-1.2-second turn latency on real telephony. Conversation stages defined in a JSON file. Hinglish out of the box. Built so a single agent process can be re-pointed at any new use case in an afternoon.
Visit live siteSchool admissions teams burn their best counsellors on the same five-minute conversation, hundreds of times a week. HEDA Voice automates the qualifying call and hands warm leads off — politely, in Hinglish, fast enough that parents never feel they're talking to a machine.
- 01 LiveKit Agents pipeline — Sarvam STT + TTS, Gemini 2.0 Flash for the LLM, Vobiz SIP for telephony.
- 02 script.json — change agent name, school name, objection handling, and stage flow without touching code.
- 03 Sub-1.2s turn latency target via streaming LLM output and TTS chunking.
- 04 Stage-aware conversation: greeting → enquiry confirm → pitch → visit invitation → handoff to admissions.
- 05 Browser demo + telephony in the same agent worker. Docker Compose deploy on a single VPS.
Client
HEDA Voice (Stacksmiths product)
Year
2026
First customer
Shiv Jyoti Convent Boarding School, Kota
Build by
Stacksmiths
Stack
Python · FastAPI · LiveKit · Sarvam · Gemini · Vobiz SIP
Time
6 weeks to first live call
A phone call,
in dark mode.
The product is a phone call — so the UI is a phone. Native call-screen black, system-UI sans, a green mic button with a glow on active recording, transcript bubbles in the colours iOS users already trust. Nothing on screen the agent itself can't justify.
Built for the way it'll be read.
Branding and engineering
at the same desk.
The compressed timeline made handoff impossible — so we didn't. AYB and Stacksmiths sat in one Slack, one Figma, one daily standup.
- W1
POC
Voice pipeline stood up end-to-end on a Friday — STT → LLM → TTS through LiveKit, on a browser-only WebRTC demo. Confirmed the latency story was real.
- W2–3
Telephony
SIP trunk via Vobiz, agent worker registered against LiveKit, auto-dispatch per inbound call. First live call to a real Indian mobile in week three.
- W4–5
Configurability
Pulled all per-client behaviour into script.json. Built the admin panel for scripts, demo tokens, and recordings. Lead/session model in SQLite.
- W6
First customer
Shiv Jyoti Convent Boarding School, Kota. Hinglish admissions script live. Conversation flow: greeting → confirm → pitch → visit → handoff.
≈ 1.1s turn latency,
in fluent Hinglish.
BLVCKCARD →
A personal site, free for life..
“Sub-2-second response latency on real telephony — no awkward pauses. Multiple use-case scripts, fluent in everyday Hinglish.”
— HEDA Voice, home
Stack: Python FastAPI + SQLite for session/lead state; LiveKit Agents for the voice pipeline; Sarvam AI for STT and TTS; Google Gemini 2.0 Flash for the LLM; Vobiz SIP for telephony; vanilla JS + LiveKit WebRTC for the browser demo. One agent process registers against LiveKit and auto-dispatches per call. Whole stack runs in Docker Compose on a single VPS — Redis, LiveKit server, SIP, agent worker, nginx.