How-To Guide

Run Hermes Agent 100% Offline — No Cloud Required

Run Hermes Agent completely offline with local models, local STT, and zero internet dependency.

Quick answer

Run Hermes fully offline by pairing a local model (via Ollama) with local speech tools — no internet, no cloud APIs, no data leaving your machine. The key constraint is the 64K-token context minimum, so configure the local model with -c 65536. This is the path to an air-gapped AI assistant.

Hermes can run 100% offline — no internet connection required, no cloud APIs, no data leaving your machine. This guide covers setting up a fully air-gapped AI assistant using local models and local speech tools.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

☑Hermes Agent installed
☑Ollama installed and a model pulled while you still have internet (e.g. 'ollama pull hermes3')
☑Sufficient hardware: 16GB+ RAM recommended for quality offline inference
☑Optional: local Whisper model downloaded in advance

Steps

1
Install Ollama
Install Ollama and pull the hermes3 model while you still have internet
2
Configure local model
Set model: provider: ollama in config.yaml — no API keys needed
3
Install local STT
pip install openai-whisper and download a model: whisper --model medium
4
Install local TTS
Use system TTS (say on macOS, espeak on Linux) or a local TTS model
5
Disable cloud features
Set telemetry: false and remove any cloud API keys from config
6
Test offline
Disconnect from the internet and verify hermes chat works end-to-end locally

Pro Tips

💡Download everything you need (models, dependencies) before going offline — once disconnected, model downloads aren't possible
💡Hermes 3 7B Q4 quantized model is a good balance of quality and speed on CPU-only hardware
💡Set 'telemetry: false' in config.yaml to disable any analytics or crash reporting that might try to reach the internet

Troubleshooting

❌ Hermes tries to reach the internet even in offline mode

✅ Check config.yaml for any cloud-based settings: cloud memory sync, telemetry, update checks. Disable each with the appropriate 'false' flag. Run 'hermes config show' to review all active settings.

❌ Ollama model loads but inference is extremely slow

✅ CPU-only inference is slow for large models. Use a smaller quantized model (7B Q4) or ensure your GPU is detected by Ollama with 'ollama run hermes3 --verbose'.

❌ Whisper transcription fails offline

✅ Whisper needs its model files downloaded in advance. Run 'python -m whisper --model medium "test.mp3"' while online to pre-download the model weights to ~/.cache/whisper/.

FAQ

Can Hermes really run with no internet?

Yes. With a local model through Ollama and local STT/TTS, Hermes runs air-gapped — no cloud APIs and no data leaving your machine.

What's the main constraint running offline?

Model context and hardware. Hermes needs at least 64K tokens (start the model with -c 65536), and capable agentic models are large, so your hardware sets the ceiling on quality.

Is offline Hermes private by default?

Yes — if every component is local, nothing leaves your network. Just confirm no tool or skill silently calls an external API.

Run Hermes Agent 100% Offline — No Cloud Required

Before you start:

Steps

Install Ollama

Configure local model

Install local STT

Install local TTS

Disable cloud features

Test offline

Pro Tips

Troubleshooting

FAQ

Can Hermes really run with no internet?

What's the main constraint running offline?

Is offline Hermes private by default?

Related Guides