Configure Context Compression — Handle Long Conversations
Set up automatic context compression so Hermes can handle long conversations without running out of context window.
Long conversations eventually hit the model's context limit. Hermes's context compression automatically summarizes older turns to make room for new messages — you get infinite conversations without losing important context.
Managed cloud · API costs included · Skill library · Cancel anytime
Before you start:
- ☑Hermes Agent installed
- ☑Understanding of token limits and context windows
Steps
- 1
Enable compression
In config.yaml, set compression: enabled: true
- 2
Set the threshold
Set compression: threshold: 0.50 to compress at 50% of context limit
- 3
Configure target ratio
Set compression: target_ratio: 0.20 to preserve 20% as recent context
- 4
Protect recent messages
Set compression: protect_last_n: 20 to always keep last 20 messages intact
- 5
Choose a compression model (optional)
Set auxiliary: compression: model: google/gemini-3-flash-preview
Pro Tips
- 💡Lower threshold values trigger compression earlier — use 0.50 for aggressive compression, 0.80 for late compression
- 💡The first 3 turns (system prompt, initial request, first response) are always protected
- 💡Compression uses a fast/cheap model (Gemini Flash) by default — minimal cost impact
- 💡Use /compress <focus> to manually trigger compression with a specific focus topic
Troubleshooting
❌ Agent forgets important earlier context
✅ Increase protect_last_n to keep more recent messages. Also ensure critical information is in MEMORY.md for persistent recall.
❌ Compression happens too often
✅ Increase threshold from 0.50 to 0.70 or 0.80 to delay compression until more of the context window is used.
❌ Compression summaries are low quality
✅ Change the compression model to a higher-quality option: auxiliary: compression: model: anthropic/claude-3-haiku
❌ Context pressure warnings but no compression
✅ Check that compression: enabled: true is set. Also verify your model's context_length is correctly detected.