Case study №06 HEDA Voice 2026

A voice agentfor anyHinglish workflow.

Sub-1.2-second turn latency on real telephony. Conversation stages defined in a JSON file. Hinglish out of the box. Built so a single agent process can be re-pointed at any new use case in an afternoon.

Visit live site
heda-voice.thestacksmiths.com
01 The brief

School admissions teams burn their best counsellors on the same five-minute conversation, hundreds of times a week. HEDA Voice automates the qualifying call and hands warm leads off — politely, in Hinglish, fast enough that parents never feel they're talking to a machine.

  • 01 LiveKit Agents pipeline — Sarvam STT + TTS, Gemini 2.0 Flash for the LLM, Vobiz SIP for telephony.
  • 02 script.json — change agent name, school name, objection handling, and stage flow without touching code.
  • 03 Sub-1.2s turn latency target via streaming LLM output and TTS chunking.
  • 04 Stage-aware conversation: greeting → enquiry confirm → pitch → visit invitation → handoff to admissions.
  • 05 Browser demo + telephony in the same agent worker. Docker Compose deploy on a single VPS.
Client

HEDA Voice (Stacksmiths product)

Year

2026

First customer

Shiv Jyoti Convent Boarding School, Kota

Build by

Stacksmiths

Stack

Python · FastAPI · LiveKit · Sarvam · Gemini · Vobiz SIP

Time

6 weeks to first live call

02 Brand system

A phone call,
in dark mode.

The product is a phone call — so the UI is a phone. Native call-screen black, system-UI sans, a green mic button with a glow on active recording, transcript bubbles in the colours iOS users already trust. Nothing on screen the agent itself can't justify.

Void#08090F
Live Green#34C759
Call Blue#0A84FF
Bone#FFFFFF
03 Process

Branding and engineering
at the same desk.

The compressed timeline made handoff impossible — so we didn't. AYB and Stacksmiths sat in one Slack, one Figma, one daily standup.

  1. W1

    POC

    Voice pipeline stood up end-to-end on a Friday — STT → LLM → TTS through LiveKit, on a browser-only WebRTC demo. Confirmed the latency story was real.

  2. W2–3

    Telephony

    SIP trunk via Vobiz, agent worker registered against LiveKit, auto-dispatch per inbound call. First live call to a real Indian mobile in week three.

  3. W4–5

    Configurability

    Pulled all per-client behaviour into script.json. Built the admin panel for scripts, demo tokens, and recordings. Lead/session model in SQLite.

  4. W6

    First customer

    Shiv Jyoti Convent Boarding School, Kota. Hinglish admissions script live. Conversation flow: greeting → confirm → pitch → visit → handoff.

05 Outcome

≈ 1.1s turn latency,
in fluent Hinglish.

0 .1s turn-to-turn ≈1.1s end-to-end on real telephony
0 minute call ceiling configurable per script (480s default)
0 stages per script JSON-defined, swappable per client
0 JSON file = new client no code change to retarget the agent
Next case

BLVCKCARD

A personal site, free for life..

06 Notes

“Sub-2-second response latency on real telephony — no awkward pauses. Multiple use-case scripts, fluent in everyday Hinglish.”

— HEDA Voice, home

Stack: Python FastAPI + SQLite for session/lead state; LiveKit Agents for the voice pipeline; Sarvam AI for STT and TTS; Google Gemini 2.0 Flash for the LLM; Vobiz SIP for telephony; vanilla JS + LiveKit WebRTC for the browser demo. One agent process registers against LiveKit and auto-dispatches per call. Whole stack runs in Docker Compose on a single VPS — Redis, LiveKit server, SIP, agent worker, nginx.