Voice Chat - Umber Demo

Voice Input/Output

Speak to Umber and hear responses - hands-free conversation experience

🎤

User speaks

→

🌊

Waveform

→

📝

Transcribe (Whisper)

→

🧠

Process

→

🔊

TTS Response

Ready State

U

Hey Umber

Voice enabled

Good morning! How can I help you today?

0:04

What's on my calendar today?

Voice input

You have 3 meetings today. The first one is at 9:30 - a standup with the engineering team.

0:07

Recording State

U

Hey Umber

Listening...

Good morning! How can I help you today?

What's on my calendar today?

You have 3 meetings today. The first one is at 9:30 - a standup with the engineering team.

Live transcription

"Remind me to call Sarah at 3pm..."

0:04

Voice Recording

Tap-to-talk or hold-to-talk recording modes with live transcription preview

Tap to Talk

Tap mic button to start, tap again (or press send) to finish

Recording...

Live Transcription

See your words appear in real-time as you speak

TRANSCRIBING

Schedule a meeting with the design team for next Tuesday...

Powered by OpenAI Whisper

Audio Quality Indicator

Visual feedback on recording quality and environment noise

Good audio quality

Signal: Strong | Noise: Low

Error Recovery

Graceful handling of transcription failures with retry option

⚠️

Couldn't hear that clearly

There was too much background noise. Try speaking closer to the microphone.

TTS Playback

Hear Umber's responses with natural-sounding text-to-speech

Message with Audio

Each assistant message can be played back

You have 3 meetings scheduled for today. Your first meeting is a standup with the engineering team at 9:30 AM. After that, you have a product review at 11:00 AM, and finally a 1:1 with Alex at 2:00 PM.

0:08 / 0:14

Auto-Play Mode

Optionally speak new responses automatically

Auto-play responses

Speak new messages automatically

🔊

Speaking response...

Playback Speed

Adjust speech rate to your preference

PLAYBACK SPEED

Voice Settings

Customize voice input and output preferences

Voice Preferences

Voice Input

Enable microphone for voice commands

Voice Output

Enable TTS for assistant responses

Auto-play Responses

Automatically speak new messages

Voice Activation

Say "Hey Umber" to start listening

Umber's Voice

Nova

Warm, natural

Alloy

Neutral, clear

Echo

Deep, calm

Shimmer

Bright, energetic

Technical Implementation

APIs, endpoints, and architecture details

API Endpoints

Backend routes for audio processing

// Speech-to-Text
POST /api/hey-umber/audio/transcribe
Input: multipart/form-data (audio)
Output: { text, duration }

// Text-to-Speech
POST /api/hey-umber/audio/speak
Input: { text, voice? }
Output: audio/mpeg stream

Service Architecture

Primary and fallback providers

SPEECH-TO-TEXT

Primary: Whisper Fallback: AWS Transcribe

TEXT-TO-SPEECH

Primary: OpenAI TTS Fallback: AWS Polly

STT Latency

< 3 seconds

Target transcription time

STT Cost

$0.006/min

OpenAI Whisper pricing

TTS Cost

$0.015/1K chars

OpenAI TTS pricing

Audio Format

MP3 / WAV

Supported formats

Sample Rate

24kHz+

High quality recording

Voices

6 options

alloy, echo, fable, onyx, nova, shimmer

Key Files

Implementation locations in the codebase

          Frontend:
          frontend/components/hey-umber/HeyUmberExperience.tsx
        
          Voice Service:
          frontend/src/services/voice/EnterpriseVoiceManager.ts
        
          Backend STT:
          backend/src/services/audio/SpeechToTextService.ts
        
          Backend TTS:
          backend/src/services/audio/TextToSpeechService.ts