Voice Input/Output

Speak to Umber and hear responses - hands-free conversation experience

🎤
User speaks
🌊
Waveform
📝
Transcribe (Whisper)
🧠
Process
🔊
TTS Response

Ready State

U
Hey Umber
Voice enabled
Good morning! How can I help you today?
0:04
What's on my calendar today?
Voice input
You have 3 meetings today. The first one is at 9:30 - a standup with the engineering team.
0:07

Recording State

U
Hey Umber
Listening...
Good morning! How can I help you today?
What's on my calendar today?
You have 3 meetings today. The first one is at 9:30 - a standup with the engineering team.
Live transcription
"Remind me to call Sarah at 3pm..."
0:04

Voice Recording

Tap-to-talk or hold-to-talk recording modes with live transcription preview

Tap to Talk

Tap mic button to start, tap again (or press send) to finish

Recording...

Live Transcription

See your words appear in real-time as you speak

TRANSCRIBING
Schedule a meeting with the design team for next Tuesday...
Powered by OpenAI Whisper

Audio Quality Indicator

Visual feedback on recording quality and environment noise

Good audio quality
Signal: Strong | Noise: Low

Error Recovery

Graceful handling of transcription failures with retry option

⚠️
Couldn't hear that clearly
There was too much background noise. Try speaking closer to the microphone.

TTS Playback

Hear Umber's responses with natural-sounding text-to-speech

Message with Audio

Each assistant message can be played back

You have 3 meetings scheduled for today. Your first meeting is a standup with the engineering team at 9:30 AM. After that, you have a product review at 11:00 AM, and finally a 1:1 with Alex at 2:00 PM.
0:08 / 0:14

Auto-Play Mode

Optionally speak new responses automatically

Auto-play responses
Speak new messages automatically
🔊
Speaking response...

Playback Speed

Adjust speech rate to your preference

PLAYBACK SPEED

Voice Settings

Customize voice input and output preferences

Voice Preferences

Voice Input

Enable microphone for voice commands

Voice Output

Enable TTS for assistant responses

Auto-play Responses

Automatically speak new messages

Voice Activation

Say "Hey Umber" to start listening

Umber's Voice

Nova
Warm, natural
Alloy
Neutral, clear
Echo
Deep, calm
Shimmer
Bright, energetic

Technical Implementation

APIs, endpoints, and architecture details

API Endpoints

Backend routes for audio processing

// Speech-to-Text
POST /api/hey-umber/audio/transcribe
Input: multipart/form-data (audio)
Output: { text, duration }

// Text-to-Speech
POST /api/hey-umber/audio/speak
Input: { text, voice? }
Output: audio/mpeg stream

Service Architecture

Primary and fallback providers

SPEECH-TO-TEXT
Primary: Whisper Fallback: AWS Transcribe
TEXT-TO-SPEECH
Primary: OpenAI TTS Fallback: AWS Polly
STT Latency
< 3 seconds
Target transcription time
STT Cost
$0.006/min
OpenAI Whisper pricing
TTS Cost
$0.015/1K chars
OpenAI TTS pricing
Audio Format
MP3 / WAV
Supported formats
Sample Rate
24kHz+
High quality recording
Voices
6 options
alloy, echo, fable, onyx, nova, shimmer

Key Files

Implementation locations in the codebase

Frontend: frontend/components/hey-umber/HeyUmberExperience.tsx
Voice Service: frontend/src/services/voice/EnterpriseVoiceManager.ts
Backend STT: backend/src/services/audio/SpeechToTextService.ts
Backend TTS: backend/src/services/audio/TextToSpeechService.ts
← Back to Demo Hub