Set Up Hermes Agent Voice Mode
Enable voice input and output for Hermes Agent — talk to your AI and hear responses spoken back.
Voice mode lets you speak to Hermes and hear responses spoken back — ideal for hands-free workflows, accessibility, or just making your AI assistant feel more natural to interact with.
Before you start:
- ☑Hermes Agent installed
- ☑pip install 'hermes-agent[voice]' for voice support
- ☑A microphone (for voice input) and speakers or headphones (for voice output)
- ☑Optional: ElevenLabs API key for high-quality voice output
Steps
- 1
Enable voice input
Set voice: input: enabled: true in config.yaml — supports Whisper for local STT
- 2
Enable voice output
Set voice: output: enabled: true and choose a TTS provider (ElevenLabs, local, or system)
- 3
Configure Whisper
Install whisper locally with pip install openai-whisper or use the Whisper API
- 4
Use via Telegram
Send voice messages to your Hermes Telegram bot — auto-transcribed and responded to
- 5
Use via CLI
hermes chat --voice enables push-to-talk in the terminal
- 6
Customize voice
Set voice: output: voiceId: to pick your preferred voice from your TTS provider
Pro Tips
- 💡Start with Telegram voice messages — record a voice memo in Telegram and Hermes auto-transcribes and responds. No extra setup needed.
- 💡Use '/voice on' in hermes chat to enable push-to-talk mode in the terminal
- 💡For the best voice quality, use ElevenLabs for TTS and Whisper (medium model) for STT — the combination sounds remarkably natural
Troubleshooting
❌ Voice input not being transcribed
✅ Check that whisper is installed: 'pip install openai-whisper'. If using the API instead of local Whisper, verify your OpenAI API key has access to the Whisper endpoint.
❌ Voice output sounds robotic or choppy
✅ Switch from the system TTS (say/espeak) to ElevenLabs or another cloud TTS provider. Set 'voice: output: provider: elevenlabs' and add your API key in config.yaml.
❌ Voice mode works in terminal but not in Telegram
✅ Telegram voice message support requires 'voice: telegram: enabled: true' in config.yaml. Make sure you're sending a voice message (hold the microphone button) and not an audio file.