Nous ResearchHermes Agent

Configure Context Compression — Handle Long Conversations

Set up automatic context compression so Hermes can handle long conversations without running out of context window.

Long conversations eventually hit the model's context limit. Hermes's context compression automatically summarizes older turns to make room for new messages — you get infinite conversations without losing important context.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

  • Hermes Agent installed
  • Understanding of token limits and context windows

Steps

  1. 1

    Enable compression

    In config.yaml, set compression: enabled: true

  2. 2

    Set the threshold

    Set compression: threshold: 0.50 to compress at 50% of context limit

  3. 3

    Configure target ratio

    Set compression: target_ratio: 0.20 to preserve 20% as recent context

  4. 4

    Protect recent messages

    Set compression: protect_last_n: 20 to always keep last 20 messages intact

  5. 5

    Choose a compression model (optional)

    Set auxiliary: compression: model: google/gemini-3-flash-preview

Pro Tips

  • 💡Lower threshold values trigger compression earlier — use 0.50 for aggressive compression, 0.80 for late compression
  • 💡The first 3 turns (system prompt, initial request, first response) are always protected
  • 💡Compression uses a fast/cheap model (Gemini Flash) by default — minimal cost impact
  • 💡Use /compress <focus> to manually trigger compression with a specific focus topic

Troubleshooting

Agent forgets important earlier context

Increase protect_last_n to keep more recent messages. Also ensure critical information is in MEMORY.md for persistent recall.

Compression happens too often

Increase threshold from 0.50 to 0.70 or 0.80 to delay compression until more of the context window is used.

Compression summaries are low quality

Change the compression model to a higher-quality option: auxiliary: compression: model: anthropic/claude-3-haiku

Context pressure warnings but no compression

Check that compression: enabled: true is set. Also verify your model's context_length is correctly detected.

Related Guides