How-To Guide

Set Up Hermes Voice Mode — Speak and Listen

Enable voice interaction with Hermes — speak your commands, hear the responses with text-to-speech.

Quick answer

Configure Hermes voice mode to speak and hear responses: push-to-talk in the CLI plus automatic transcription of voice messages on Telegram, Discord, and other channels. Set the STT and TTS engines in config — cloud for quality or local for an offline, private pipeline.

Voice mode transforms Hermes into a spoken assistant. Speak your commands instead of typing, and hear responses read aloud. Works in the CLI with push-to-talk and automatically transcribes voice messages on Telegram, Discord, and other platforms.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

☑Hermes Agent installed
☑Microphone and speakers/headphones
☑For local STT: enough RAM for whisper model (~1GB for 'base')

Steps

1
Install voice dependencies
pip install 'hermes-agent[voice]' to add faster-whisper and audio libraries
2
Configure STT provider
In config.yaml, set stt: enabled: true and choose provider: local, groq, or openai
3
Configure TTS provider
Choose a TTS provider: Edge TTS (free), ElevenLabs, OpenAI, MiniMax, or Mistral
4
Add API keys if needed
Add GROQ_API_KEY or VOICE_TOOLS_OPENAI_KEY to .env for cloud providers
5
Enable voice in CLI
Type /voice on in the CLI, then press Ctrl+B to record

Pro Tips

💡Local STT is free and runs on-device — no API key needed, just CPU/RAM
💡Groq offers free-tier Whisper with excellent speed — great alternative to local
💡Edge TTS is free and sounds natural — best value for TTS
💡STT model sizes: tiny (fast, less accurate), base (balanced), large-v3 (most accurate, slow)

Troubleshooting

❌ No audio input detected

✅ Check your microphone permissions and default input device. In CLI, verify Ctrl+B triggers recording.

❌ STT transcription is inaccurate

✅ Try a larger model: stt: local: model: small or stt: local: model: large-v3. More accurate but slower.

❌ TTS voice sounds robotic

✅ Switch to a higher-quality provider like ElevenLabs or OpenAI TTS. Add the API key to .env.

❌ Voice messages not transcribed on Telegram

✅ Ensure stt: enabled: true is set. Also check that the voice file format is supported (opus, ogg, mp3).

FAQ

How do I enable voice mode?

Configure the STT (speech-to-text) and TTS (text-to-speech) engines in Hermes config. You then get push-to-talk in the CLI and automatic voice-message transcription on supported channels.

Cloud or local speech engines?

Cloud engines give better quality and latency; local engines keep everything offline and private. Choose based on your privacy needs — Hermes supports both.

Which channels support voice?

The CLI (push-to-talk) plus messaging channels like Telegram and Discord, where Hermes transcribes incoming voice notes automatically.

Set Up Hermes Voice Mode — Speak and Listen

Before you start:

Steps

Install voice dependencies

Configure STT provider

Configure TTS provider

Add API keys if needed

Enable voice in CLI

Pro Tips

Troubleshooting

FAQ

How do I enable voice mode?

Cloud or local speech engines?

Which channels support voice?

Related Guides