How-To Guide

Set Up Hermes Agent Voice Mode

Enable voice input and output for Hermes Agent — talk to your AI and hear responses spoken back.

Quick answer

Hermes voice mode lets you speak commands and hear responses spoken back, with push-to-talk in the CLI and automatic transcription of voice messages on Telegram, Discord, and other channels. It's built in — enable the voice pipeline (STT + TTS) in config for hands-free, accessible, or natural interaction.

Voice mode lets you speak to Hermes and hear responses spoken back — ideal for hands-free workflows, accessibility, or just making your AI assistant feel more natural to interact with.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

☑Hermes Agent installed
☑pip install 'hermes-agent[voice]' for voice support
☑A microphone (for voice input) and speakers or headphones (for voice output)
☑Optional: ElevenLabs API key for high-quality voice output

Steps

1
Enable voice input
Set voice: input: enabled: true in config.yaml — supports Whisper for local STT
2
Enable voice output
Set voice: output: enabled: true and choose a TTS provider (ElevenLabs, local, or system)
3
Configure Whisper
Install whisper locally with pip install openai-whisper or use the Whisper API
4
Use via Telegram
Send voice messages to your Hermes Telegram bot — auto-transcribed and responded to
5
Use via CLI
hermes chat --voice enables push-to-talk in the terminal
6
Customize voice
Set voice: output: voiceId: to pick your preferred voice from your TTS provider

Pro Tips

💡Start with Telegram voice messages — record a voice memo in Telegram and Hermes auto-transcribes and responds. No extra setup needed.
💡Use '/voice on' in hermes chat to enable push-to-talk mode in the terminal
💡For the best voice quality, use ElevenLabs for TTS and Whisper (medium model) for STT — the combination sounds remarkably natural

Troubleshooting

❌ Voice input not being transcribed

✅ Check that whisper is installed: 'pip install openai-whisper'. If using the API instead of local Whisper, verify your OpenAI API key has access to the Whisper endpoint.

❌ Voice output sounds robotic or choppy

✅ Switch from the system TTS (say/espeak) to ElevenLabs or another cloud TTS provider. Set 'voice: output: provider: elevenlabs' and add your API key in config.yaml.

❌ Voice mode works in terminal but not in Telegram

✅ Telegram voice message support requires 'voice: telegram: enabled: true' in config.yaml. Make sure you're sending a voice message (hold the microphone button) and not an audio file.

FAQ

How does voice mode work in Hermes?

It's a built-in STT + TTS pipeline: push-to-talk in the CLI, and automatic transcription of voice messages on channels like Telegram and Discord, with spoken responses back.

Do I need a cloud service for voice?

You can use cloud STT/TTS for quality, or local speech tools for a fully offline, private setup. The pipeline supports both depending on your privacy and latency needs.

Where is voice mode useful?

Hands-free workflows, accessibility, and natural phone-style interaction over messaging channels where you'd rather send a voice note than type.

Set Up Hermes Agent Voice Mode

Before you start:

Steps

Enable voice input

Enable voice output

Configure Whisper

Use via Telegram

Use via CLI

Customize voice

Pro Tips

Troubleshooting

FAQ

How does voice mode work in Hermes?

Do I need a cloud service for voice?

Where is voice mode useful?

Related Guides