Use Hermes Agent with Ollama for Local AI
Run Hermes Agent with local LLMs via Ollama — fully private, no API keys, no cloud dependency.
Running Hermes with Ollama keeps everything on your machine — no API keys, no cloud costs, no data leaving your network. It's the ideal setup for privacy-conscious users or anyone who wants full control over their AI stack.
Before you start:
- ☑Hermes Agent installed
- ☑Ollama installed (ollama.com) on the same or a networked machine
- ☑Sufficient RAM: 8GB minimum for 7B models, 16GB+ recommended for 13B+ models
- ☑Optional: NVIDIA or AMD GPU for significantly faster inference
Steps
- 1
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh — works on Linux and macOS
- 2
Pull a model
ollama pull hermes3 for the official Hermes 3 model, or any compatible model
- 3
Configure Hermes
Set model: provider: ollama and model: name: hermes3 in config.yaml
- 4
Set the endpoint
Ollama defaults to http://localhost:11434 — change if running on another machine
- 5
Start Hermes
hermes start — all inference runs locally with zero data leaving your machine
Pro Tips
- 💡The official Hermes 3 model ('ollama pull hermes3') is optimized for tool use and works best with Hermes Agent — start here before trying other models
- 💡For VPS deployments without a GPU, try smaller quantized models (Q4_K_M) — they run on CPU but are slower
- 💡Ollama can run on a separate powerful machine while Hermes runs on a smaller server — set 'model: baseUrl: http://your-gpu-machine:11434' in config
Troubleshooting
❌ Ollama connection refused error
✅ Check that Ollama is running: 'ollama serve'. By default it listens on localhost:11434. If running on a different machine, ensure port 11434 is open in your firewall.
❌ Hermes responses are extremely slow with Ollama
✅ You're likely running CPU-only inference on a model too large for your RAM. Try a smaller model ('ollama pull hermes3:7b-q4') or add a GPU to your setup.
❌ Model not found error despite pulling it
✅ Check the model name spelling: 'ollama list' shows installed models. Use the exact name shown, including any variant suffix.