How-To Guide

Configure Reasoning Effort — Speed vs Depth

Control how much 'thinking' the model does before responding — balance speed, cost, and reasoning depth.

Quick answer

Reasoning effort controls how much the model 'thinks' before answering: higher effort means deeper analysis but slower, costlier responses; lower effort is faster and cheaper but can miss nuance. Set it per workload in config — high effort for hard problems, low for routine high-volume tasks.

Reasoning effort controls how much 'thinking' the model does before responding. Higher effort means deeper analysis but slower responses and higher token costs. Lower effort is faster and cheaper but may miss nuances. Find the right balance for your use case.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

☑Hermes Agent installed
☑A model that supports reasoning effort (most OpenRouter and Nous Portal models)

Steps

1
Understand reasoning levels
Levels: xhigh (max), high, medium (default), low, minimal, none (disable)
2
Set default level
In config.yaml, set agent: reasoning_effort: medium
3
Adjust per-task at runtime
Use hermes config set agent.reasoning_effort high before complex tasks
4
Monitor reasoning output
Set display: show_reasoning: true to see the model's thought process

Pro Tips

💡Use 'high' or 'xhigh' for complex architecture decisions, debugging, and analysis
💡Use 'low' or 'minimal' for simple file operations, git commands, and routine tasks
💡The /reasoning show command displays the model's thinking in a dim box above responses
💡Heavy users report 20-40% token savings by switching reasoning levels based on task complexity

Troubleshooting

❌ Reasoning effort setting not taking effect

✅ Not all models support reasoning effort. Check your provider's documentation. OpenRouter and Nous Portal models support it.

❌ Responses too slow with high reasoning

✅ Switch to medium or low for routine tasks. Use /reasoning show to see if the extra thinking is actually helping.

❌ Agent missing obvious solutions

✅ Increase reasoning_effort to high or xhigh. The agent may be skipping analysis steps at lower levels.

FAQ

What does reasoning effort change?

How much the model deliberates before responding. Higher effort improves depth on hard problems at the cost of speed and tokens; lower effort is faster and cheaper but shallower.

When should I lower reasoning effort?

For routine, high-volume, or latency-sensitive tasks where deep analysis isn't needed. Save high effort for genuinely hard reasoning.

Does effort affect cost?

Yes — more reasoning means more tokens. Tuning effort per workload is a direct lever on both speed and spend.

Configure Reasoning Effort — Speed vs Depth

Before you start:

Steps

Understand reasoning levels

Set default level

Adjust per-task at runtime

Monitor reasoning output

Pro Tips

Troubleshooting

FAQ

What does reasoning effort change?

When should I lower reasoning effort?

Does effort affect cost?

Related Guides