Nous ResearchHermes Agent

Set Up Hermes Voice Mode — Speak and Listen

Enable voice interaction with Hermes — speak your commands, hear the responses with text-to-speech.

Voice mode transforms Hermes into a spoken assistant. Speak your commands instead of typing, and hear responses read aloud. Works in the CLI with push-to-talk and automatically transcribes voice messages on Telegram, Discord, and other platforms.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

  • Hermes Agent installed
  • Microphone and speakers/headphones
  • For local STT: enough RAM for whisper model (~1GB for 'base')

Steps

  1. 1

    Install voice dependencies

    pip install 'hermes-agent[voice]' to add faster-whisper and audio libraries

  2. 2

    Configure STT provider

    In config.yaml, set stt: enabled: true and choose provider: local, groq, or openai

  3. 3

    Configure TTS provider

    Choose a TTS provider: Edge TTS (free), ElevenLabs, OpenAI, MiniMax, or Mistral

  4. 4

    Add API keys if needed

    Add GROQ_API_KEY or VOICE_TOOLS_OPENAI_KEY to .env for cloud providers

  5. 5

    Enable voice in CLI

    Type /voice on in the CLI, then press Ctrl+B to record

Pro Tips

  • 💡Local STT is free and runs on-device — no API key needed, just CPU/RAM
  • 💡Groq offers free-tier Whisper with excellent speed — great alternative to local
  • 💡Edge TTS is free and sounds natural — best value for TTS
  • 💡STT model sizes: tiny (fast, less accurate), base (balanced), large-v3 (most accurate, slow)

Troubleshooting

No audio input detected

Check your microphone permissions and default input device. In CLI, verify Ctrl+B triggers recording.

STT transcription is inaccurate

Try a larger model: stt: local: model: small or stt: local: model: large-v3. More accurate but slower.

TTS voice sounds robotic

Switch to a higher-quality provider like ElevenLabs or OpenAI TTS. Add the API key to .env.

Voice messages not transcribed on Telegram

Ensure stt: enabled: true is set. Also check that the voice file format is supported (opus, ogg, mp3).

Related Guides