Nous ResearchHermes Agent
Deploy Now

Model Agnostic — Use Any AI Model You Want

Key Points

  • 200+ model providers supported
  • Bring Your Own Key (BYOK)
  • Switch models per task
  • OpenAI, Anthropic, Google, Mistral, Groq
  • Local models via Ollama
  • Cost optimization with model routing

How It Works

  1. 1Add your API key to config.yaml
  2. 2Set default model or per-task model
  3. 3Hermes routes requests to your chosen provider
  4. 4Switch models on the fly with /model command

Real-World Use Cases

Cost Optimization via Model Routing

Use a cheap fast model (GPT-4o mini, Haiku) for quick lookups and a powerful expensive model (GPT-4o, Sonnet) only for complex reasoning. Configure routing rules so Hermes picks the right model automatically based on task complexity — dramatically cutting costs without sacrificing quality.

Provider Failover Chain

Configure a fallback chain: try Anthropic first, fall back to OpenAI if rate-limited, fall back to local Ollama if both are down. v0.6.0 added automatic failover — your workflows keep running even during provider outages.

Local-First Privacy Mode

Route sensitive tasks to a local Ollama model — nothing leaves your machine. Route commodity tasks to cloud providers for speed and capability. The routing config is your privacy boundary; you control what data each provider sees.

Specialized Model Per Task Type

Use Gemini 1.5 Pro for long-context document analysis (1M token window), Claude for nuanced writing, Codestral for code generation, and a local embedding model for semantic search. Each task type gets the model best suited for it.

Under the Hood

Hermes implements a unified provider abstraction layer that normalizes the APIs of 200+ model providers into a single interface. The abstraction handles provider-specific quirks: tool calling format differences, streaming protocol variations, context window limits, and retry behavior. When you switch providers, the agent code doesn't change — only the provider config does. This is not just OpenAI-compatible; Hermes has native integrations with Anthropic's tool use format, Google's Gemini API, and the OpenRouter meta-router for accessing any provider through a single API key.

The routing system supports multiple selection strategies: explicit (always use this model), task-based (use model X for coding tasks, model Y for research), cost-based (use the cheapest model that meets the quality threshold), and latency-based (use the fastest model that meets the quality threshold). Custom routing rules can be written as YAML predicates that inspect the task context. The /model command switches the active model for the current session without restarting anything.

BYOK (Bring Your Own Key) is the only model access model — Hermes never proxies your requests through its own infrastructure. Your API keys are stored in your local config (config.yaml or environment variables), your requests go directly from your machine to your chosen provider, and your conversation data never touches Hermes servers. This is not just a privacy feature; it means you get the provider's full rate limits, not rate limits shared with other Hermes users.

Related Features