Nous ResearchHermes Agent
Deploy Now

Run Hermes Agent 100% Offline — No Cloud Required

Run Hermes Agent completely offline with local models, local STT, and zero internet dependency.

Hermes can run 100% offline — no internet connection required, no cloud APIs, no data leaving your machine. This guide covers setting up a fully air-gapped AI assistant using local models and local speech tools.

Before you start:

  • Hermes Agent installed
  • Ollama installed and a model pulled while you still have internet (e.g. 'ollama pull hermes3')
  • Sufficient hardware: 16GB+ RAM recommended for quality offline inference
  • Optional: local Whisper model downloaded in advance

Steps

  1. 1

    Install Ollama

    Install Ollama and pull the hermes3 model while you still have internet

  2. 2

    Configure local model

    Set model: provider: ollama in config.yaml — no API keys needed

  3. 3

    Install local STT

    pip install openai-whisper and download a model: whisper --model medium

  4. 4

    Install local TTS

    Use system TTS (say on macOS, espeak on Linux) or a local TTS model

  5. 5

    Disable cloud features

    Set telemetry: false and remove any cloud API keys from config

  6. 6

    Test offline

    Disconnect from the internet and verify hermes chat works end-to-end locally

Pro Tips

  • 💡Download everything you need (models, dependencies) before going offline — once disconnected, model downloads aren't possible
  • 💡Hermes 3 7B Q4 quantized model is a good balance of quality and speed on CPU-only hardware
  • 💡Set 'telemetry: false' in config.yaml to disable any analytics or crash reporting that might try to reach the internet

Troubleshooting

Hermes tries to reach the internet even in offline mode

Check config.yaml for any cloud-based settings: cloud memory sync, telemetry, update checks. Disable each with the appropriate 'false' flag. Run 'hermes config show' to review all active settings.

Ollama model loads but inference is extremely slow

CPU-only inference is slow for large models. Use a smaller quantized model (7B Q4) or ensure your GPU is detected by Ollama with 'ollama run hermes3 --verbose'.

Whisper transcription fails offline

Whisper needs its model files downloaded in advance. Run 'python -m whisper --model medium "test.mp3"' while online to pre-download the model weights to ~/.cache/whisper/.

Related Guides