How-To Guide

Run Hermes Locally with Ollama — Complete Privacy

Run Hermes entirely offline with Ollama — no API keys, no cloud, complete privacy with local LLMs.

Quick answer

Configure Ollama as a local provider to run open models entirely on your machine — no API keys, no cloud, no data leaving your computer. Set the Ollama base URL in config and start the model with at least 64K context (-c 65536), which Hermes requires or it rejects the model at startup.

Ollama runs open-source LLMs entirely on your machine. No API keys, no cloud dependency, no data leaves your computer. Perfect for privacy-sensitive work, offline environments, or when you want to avoid API costs entirely.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

☑Hermes Agent installed
☑Ollama installed
☑Sufficient RAM/VRAM for your chosen model (8GB minimum, 24GB+ recommended)

Steps

1
Install Ollama
Download and install from ollama.com
2
Pull a model
ollama pull qwen3.5:35b or ollama pull gemma4:27b
3
Start Ollama server
ollama serve (runs on localhost:11434)
4
Configure Hermes
Set model: provider: ollama and model: base_url: http://localhost:11434/v1
5
Choose your model
Set model: default: qwen3.5:35b (or your pulled model)

Pro Tips

💡Carnice 35B A3B is specifically tuned for Hermes tool calling — most reliable local model
💡Qwen 3.5 35B and Gemma 4 27B are popular community choices
💡Use Q4_K_M quantization for balance of quality and memory usage
💡Set context_length explicitly if auto-detection is wrong

Troubleshooting

❌ Model doesn't use tools

✅ Many local models don't support function calling. Use Carnice, Qwen 3.5, or another Tier 1 model from the community recommendations.

❌ Out of memory

✅ Try a smaller model or lower quantization. 7B models need ~8GB, 35B models need ~24GB VRAM.

❌ Slow responses

✅ Local inference is CPU/GPU bound. Use a smaller model, enable GPU acceleration in Ollama, or accept the speed tradeoff for privacy.

❌ Connection refused

✅ Ensure ollama serve is running. Check that the base_url matches Ollama's actual address (default localhost:11434).

FAQ

How do I point Hermes at a local Ollama model?

Configure the Ollama provider with its local base URL and the model name. Make sure the model is started with at least 64K context (-c 65536).

Why does Hermes reject my local model?

Context size. Hermes requires at least 64,000 tokens and rejects smaller windows at startup. Start the Ollama model with -c 65536.

What's the benefit of the local Ollama provider?

Zero per-token cost, full privacy, and offline capability — inference runs on your hardware so no data leaves your machine.

Run Hermes Locally with Ollama — Complete Privacy

Before you start:

Steps

Install Ollama

Pull a model

Start Ollama server

Configure Hermes

Choose your model

Pro Tips

Troubleshooting

FAQ

How do I point Hermes at a local Ollama model?

Why does Hermes reject my local model?

What's the benefit of the local Ollama provider?

Related Guides