Nous ResearchHermes Agent

Run Hermes Locally with Ollama — Complete Privacy

Run Hermes entirely offline with Ollama — no API keys, no cloud, complete privacy with local LLMs.

Ollama runs open-source LLMs entirely on your machine. No API keys, no cloud dependency, no data leaves your computer. Perfect for privacy-sensitive work, offline environments, or when you want to avoid API costs entirely.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

  • Hermes Agent installed
  • Ollama installed
  • Sufficient RAM/VRAM for your chosen model (8GB minimum, 24GB+ recommended)

Steps

  1. 1

    Install Ollama

    Download and install from ollama.com

  2. 2

    Pull a model

    ollama pull qwen3.5:35b or ollama pull gemma4:27b

  3. 3

    Start Ollama server

    ollama serve (runs on localhost:11434)

  4. 4

    Configure Hermes

    Set model: provider: ollama and model: base_url: http://localhost:11434/v1

  5. 5

    Choose your model

    Set model: default: qwen3.5:35b (or your pulled model)

Pro Tips

  • 💡Carnice 35B A3B is specifically tuned for Hermes tool calling — most reliable local model
  • 💡Qwen 3.5 35B and Gemma 4 27B are popular community choices
  • 💡Use Q4_K_M quantization for balance of quality and memory usage
  • 💡Set context_length explicitly if auto-detection is wrong

Troubleshooting

Model doesn't use tools

Many local models don't support function calling. Use Carnice, Qwen 3.5, or another Tier 1 model from the community recommendations.

Out of memory

Try a smaller model or lower quantization. 7B models need ~8GB, 35B models need ~24GB VRAM.

Slow responses

Local inference is CPU/GPU bound. Use a smaller model, enable GPU acceleration in Ollama, or accept the speed tradeoff for privacy.

Connection refused

Ensure ollama serve is running. Check that the base_url matches Ollama's actual address (default localhost:11434).

Related Guides