Case study №06 HEDA Voice 2026

A voice agentfor anyHinglish workflow.

Sub-1.2-second turn latency on real telephony. Conversation stages defined in a JSON file. Hinglish out of the box. Built so a single agent process can be re-pointed at any new use case in an afternoon.

Visit live site

heda-voice.thestacksmiths.com

01 The brief

School admissions teams burn their best counsellors on the same five-minute conversation, hundreds of times a week. HEDA Voice automates the qualifying call and hands warm leads off — politely, in Hinglish, fast enough that parents never feel they're talking to a machine.

01 LiveKit Agents pipeline — Sarvam STT + TTS, Gemini 2.0 Flash for the LLM, Vobiz SIP for telephony.
02 script.json — change agent name, school name, objection handling, and stage flow without touching code.
03 Sub-1.2s turn latency target via streaming LLM output and TTS chunking.
04 Stage-aware conversation: greeting → enquiry confirm → pitch → visit invitation → handoff to admissions.
05 Browser demo + telephony in the same agent worker. Docker Compose deploy on a single VPS.

Client

HEDA Voice (Stacksmiths product)

Year

2026

First customer

Shiv Jyoti Convent Boarding School, Kota

Build by

Stacksmiths

Stack

Python · FastAPI · LiveKit · Sarvam · Gemini · Vobiz SIP

Time

6 weeks to first live call

02 Brand system

A phone call,
in dark mode.

The product is a phone call — so the UI is a phone. Native call-screen black, system-UI sans, a green mic button with a glow on active recording, transcript bubbles in the colours iOS users already trust. Nothing on screen the agent itself can't justify.

Mark — green dot, monospaced wordmark

Void#08090F

Live Green#34C759

Call Blue#0A84FF

Bone#FFFFFF

04 The website

Built for the way it'll be read.

Hero — 'Sub-second voice. Fluent Hinglish.'

Latency budget — under 1.2s, end to end

script.json — one file, any use case

Admin — scripts, demo tokens, recordings

03 Process

Branding and engineering
at the same desk.

The compressed timeline made handoff impossible — so we didn't. AYB and Stacksmiths sat in one Slack, one Figma, one daily standup.

W1

POC

Voice pipeline stood up end-to-end on a Friday — STT → LLM → TTS through LiveKit, on a browser-only WebRTC demo. Confirmed the latency story was real.
W2–3

Telephony

SIP trunk via Vobiz, agent worker registered against LiveKit, auto-dispatch per inbound call. First live call to a real Indian mobile in week three.
W4–5

Configurability

Pulled all per-client behaviour into script.json. Built the admin panel for scripts, demo tokens, and recordings. Lead/session model in SQLite.
W6

First customer

Shiv Jyoti Convent Boarding School, Kota. Hinglish admissions script live. Conversation flow: greeting → confirm → pitch → visit → handoff.

05 Outcome

≈ 1.1s turn latency,
in fluent Hinglish.

0 .1s turn-to-turn ≈1.1s end-to-end on real telephony

0 minute call ceiling configurable per script (480s default)

0 stages per script JSON-defined, swappable per client

0 JSON file = new client no code change to retarget the agent

Next case

BLVCKCARD →

A personal site, free for life..

→

06 Notes

“Sub-2-second response latency on real telephony — no awkward pauses. Multiple use-case scripts, fluent in everyday Hinglish.”

— HEDA Voice, home

Stack: Python FastAPI + SQLite for session/lead state; LiveKit Agents for the voice pipeline; Sarvam AI for STT and TTS; Google Gemini 2.0 Flash for the LLM; Vobiz SIP for telephony; vanilla JS + LiveKit WebRTC for the browser demo. One agent process registers against LiveKit and auto-dispatches per call. Whole stack runs in Docker Compose on a single VPS — Redis, LiveKit server, SIP, agent worker, nginx.

A voice agentfor anyHinglish workflow.

Client

Year

First customer

Build by

Stack

Time

A phone call,in dark mode.

Built for the way it'll be read.

Branding and engineeringat the same desk.

POC

Telephony

Configurability

First customer

≈ 1.1s turn latency, in fluent Hinglish.

BLVCKCARD →

A phone call,
in dark mode.

Branding and engineering
at the same desk.

≈ 1.1s turn latency,
in fluent Hinglish.