Nous ResearchHermes Agent
Deploy Now

How Hermes Agent Gets Better Over Time

·self-improving ai agentself-improvementskillslearning-loop

Hermes creates skills from experience — the agent that gets genuinely better at your work the longer you use it. Here's how the loop works.

Want to try Hermes Agent yourself?

Try Hermes Free → Deploy in 60 seconds

"Self-improving AI" sounds like hype. When Nous Research uses the term for Hermes Agent, they mean something specific and measurable: a closed learning loop that makes Hermes genuinely better at your specific workflows the longer you use it.

Install Hermes first.

The Hype vs. Reality

Let us be precise about what self-improvement is and is not.

What it IS:

  • Hermes creates skill documents from completed tasks — capturing procedures, pitfalls, and verified workflows
  • Skills self-improve over time as Hermes refines them through use
  • Memory accumulates facts about your environment, preferences, and corrections
  • After 20–30 complex tasks, you have an agent that handles your specific workflows measurably faster and with fewer errors

What it IS NOT:

  • Hermes does not modify its own weights or neural architecture
  • It does not autonomously decide to rewrite its core code
  • It does not improve at general tasks it has not worked on with you
  • The LLM underneath stays the same

The improvement is behavioral and procedural, not architectural. Think of it as a skilled contractor who gets better at your specific house with every project.

The Closed Learning Loop: 4 Stages

Every interaction with Hermes feeds a four-stage loop:

Stage 1: Task Execution — Hermes runs your task, using tools, writing code, browsing the web, spawning subagents as needed.

Stage 2: Self-Evaluation Checkpoint — Every 15 tool calls, Hermes pauses to evaluate: What did I do? What worked? What failed? Is this experience worth capturing?

Stage 3: Skill Creation or Update — If the experience is worth capturing, Hermes writes or patches a skill document.

Stage 4: Memory Update — Key facts, corrections, and conventions are written to MEMORY.md and USER.md, available to all future sessions.

The YouTube reviewer who tested this: "Every 15 tool calls, it hits what's called a self-evaluation checkpoint saying, 'Hey, what have I done? Did it work? What failed? What's worth remembering? Should I create a skill?' It proactively creates skills. I don't have to do it myself."

The Learning Loop Mechanism

Skills are markdown documents in ~/.hermes/skills/ that follow the agentskills.io standard. They are the agent procedural memory.

Hermes creates skills via the skill_manage tool:

skill_manage(action="create",
    name="competitor-analysis-workflow",
    content="# Competitor Analysis Workflow\n\n## When to Use\nWhen researching competitor content strategy...\n\n## Procedure\n1. Use browser tool to capture screenshots\n2. Run vision analysis\n3. Store findings to ~/research/competitors/\n\n## Pitfalls\n- Do not analyze more than 5 competitors per session\n- Always timestamp findings\n\n## Verification\nCheck output file exists")

Patches (preferred for updates):

skill_manage(action="patch",
    name="competitor-analysis-workflow",  
    old_text="Do not analyze more than 5",
    new_text="Do not analyze more than 3 (5 causes context overflow)")

Cost Calc. Setup.

What Happens After 20–30 Tasks

After a month of regular use, users report a qualitatively different experience. From the YouTube transcript: "The agent working with you in month three is meaningfully more capable on your specific work than the agent you started with in month one."

What changes:

  • Speed: Tasks that took 25 tool calls in week one take 8–10 in week six (skill reuse replaces rediscovery)
  • Accuracy: The agent stops making mistakes it has been corrected on
  • Anticipation: Hermes proactively applies brand guidelines, code conventions without prompting
  • Skill library: 10–40 agent-created skills specific to your work

From Discord, teknium confirmed that GLM scores above 99 on TAU2, Qwen3.5 27B at 79 — these models are designed agentic, capable of multi-step tool use that feeds the learning loop.

Cost Calc first.

Atropos RL Integration

Hermes powers Nous Research own RL training pipeline. Atropos is their reinforcement learning framework, and Hermes is integrated as a trajectory generator:

hermes batch --workers 4 --checkpoint ./training_data

Features: Batch trajectory generation with checkpointing, parallel workers, export to ShareGPT for fine-tuning, trajectory compression.

From the official tweet: "It also powers our agentic RL pipeline, expanding Atropos so you can run RL with Hermes Agent primitives, and it supports mass-scale data generation out of the box."

For researchers: Hermes is infrastructure for training the next generation of tool-calling models.

The Community Skill Marketplace

Skills are shareable. The agentskills.io standard means any skill can be packaged and distributed.

Installing community skills:

hermes skills browse                         
hermes skills install anthropics/skills/k8s  
hermes skills install openai/skills/codex    
hermes skills tap add myorg/team-skills      
hermes skills publish --to github            

Notable community skills from awesome-hermes-agent:

  • Anthropic-Cybersecurity-Skills (3.6k stars) — 734+ security skills mapped to MITRE ATT&CK
  • chainlink-agent-skills — Official Chainlink oracle integration
  • black-forest-labs/skills — Official FLUX image generation
  • wondelai/skills (250+ stars) — Cross-platform workflow library
  • hermes-skill-factory — Auto-generates reusable skills from workflows

Hub sources: Official (bundled), skills.sh (Vercel), GitHub, ClawHub, Claude marketplace.

Bundled Skills (Fresh Install)

Fresh install includes 40+ bundled skills:

  • MLOps: Axolotl fine-tuning, model evaluation, dataset prep
  • DevOps: GitHub PR workflows, deployment, k8s operations
  • Research: Deep web research, PDF analysis, citation management
  • Content: Image generation, TTS, social media workflows
  • Development: Code review, refactoring, testing patterns
  • System: Cron scheduling, backup, monitoring

All bundled skills use agentskills.io standard and are platform-aware.

Limitations: When Self-Improvement Does Not Help

New domains: If you ask Hermes to do something it has zero experience with, it starts from scratch. Self-improvement only helps in areas you have actually worked in.

Structural errors: If the underlying LLM makes consistent logical mistakes, skill creation just captures the workaround. Better model = better foundation.

Token budget problems: A growing skill library is great until you have 50+ skills and the system prompt is huge. Progressive disclosure helps, but monitor it.

Across installations: Skills on your home server do not automatically appear on your work server. Sync or use profiles export/import.

One-shot tasks: If you only do something once, skill creation adds overhead without ROI. The learning loop shines on recurring workflows.

FAQ

Can I create skills manually? Yes. Drop a SKILL.md file in ~/.hermes/skills/category-name/SKILL.md following the agentskills.io format. It becomes available as a slash command immediately.

Can I prevent Hermes from creating skills automatically? Not currently via a flag, but you can delete unwanted skills.

How many skills is too many? Community consensus: 30–50 custom skills is manageable.

Do skills work across model changes? Yes. Skills are model-agnostic markdown documents.

Is the RL training pipeline open source? Yes. Atropos integration code is in the Hermes repository.

The Self-Evaluation Checkpoint in Detail

The mechanism is specific. Every 15 tool calls, Hermes pauses and runs an internal evaluation before continuing:

  • What did I accomplish in these 15 steps?
  • What approaches worked and which failed?
  • Is there a reusable procedure worth capturing as a skill?
  • Are there environment facts or user preferences to add to memory?

This checkpoint is why Hermes improves specifically on your work rather than generically. It is not fine-tuning the model — it is capturing the workflow logic, pitfalls, and verification steps that were discovered through trial and error on your actual tasks.

Skill Improvement Over Time

Skills do not just get created — they get refined. Each time Hermes uses an existing skill and encounters something new (a new edge case, a better approach, a correction from you), it patches the skill:

skill_manage(action="patch",
    name="image-generation-branded",
    old_text="Logo opacity should be 70%",
    new_text="Logo opacity: 70% for dark backgrounds, 50% for light backgrounds (learned 2026-03-15)")

Over 20–30 uses of the same workflow, the skill document becomes a sophisticated, experience-hardened playbook. This is procedural memory that gets better with use — analogous to how a skilled professional's mental model of a task sharpens over months.

Real Production Example: Content Agency Workflow

The YouTube reviewer who switched from OpenClaw described his content production pipeline after two weeks with Hermes:

"I was making images with mine for statics, right? For carousels, for statics, for ads, for whatever... It's doing a combination of using Nano Banana through fal.ai, which is just like an agentic API for generative tools... and it's also doing Python because I want my logo added onto these."

He described the key self-improvement moment: "And it was smart enough to tell me, 'Hey, we're going to use a hybrid approach. We're going to use Python to actually overlay your logo onto the images and I'll just make the images with Nano Banana.'"

Hermes created a skill for this hybrid approach. The next time he needed brand images, it was there — no re-discovery, no re-negotiation.

He also set up a multi-agent workflow: "The social media manager who is scheduling out and writing posts... and then this visual agent is giving that data to the ads creator agent. Again, without any setup, all I did was say, 'Hey, I want you to do research on these brands that are our competitors.'"

Research Pipeline Applications

For AI researchers, the self-improvement loop feeds directly into Nous Research's training infrastructure. Hermes generates:

  • Batch trajectories from real agentic tasks (tool calls, outcomes, corrections)
  • Training data exportable to ShareGPT format for fine-tuning
  • Atropos-compatible RL training data from the correction and skill-improvement loop

From the official positioning: Hermes is being used internally by Nous Research to improve future model generations. The self-improvement loop running on user machines generates the kind of tool-calling trajectory data that is hard to synthesize artificially.

What the Community Says After Extended Use

From r/hermesagent and Discord, power users after 1–3 months:

On the skill accumulation: Discord user stefan171 — "I think I have truly entered the multi terminal stage of my AI journey" — running orchestrator + worker patterns, with skills managing the coordination protocols.

On cost management after getting comfortable: Discord user 64sf — "I prepaid for zai-coding and kimi-coding maximum plans on black friday and new years deals, prepaid yearly." Budget planning becomes possible once you understand your usage patterns.

On the community ecosystem value: geezeruk (Discord) on the PLUR engram plugin — "Shared episodic memory across 6 agents is genuinely powerful and would be hard to replicate any other way."

The common thread: users who stick with Hermes past the first two weeks consistently report a qualitative shift in the agent's usefulness around weeks 4–6.

Frequently Asked Questions

How does Hermes actually create skills from experience?

Every 15 tool calls, Hermes hits a self-evaluation checkpoint where it asks whether the completed work is worth capturing as a reusable procedure. If yes, it writes a SKILL.md document to ~/.hermes/skills/ covering the workflow, pitfalls, and verification steps — automatically.

Can I prevent Hermes from creating skills automatically?

There's no single flag to disable automatic skill creation, but you can delete unwanted skills at any time. Simple tasks typically don't trigger skill creation — it only happens after complex, multi-step work that the agent judges worth remembering.

How is Hermes self-improvement different from fine-tuning a model?

Fine-tuning modifies the model's weights globally for all users. Hermes self-improvement captures workflows, pitfalls, and conventions as markdown documents that any model can use. The improvement is specific to your work, not distributed to everyone running Hermes.

How many skills should I accumulate before things get unwieldy?

Community consensus is that 30–50 custom skills is manageable before archiving low-use ones. The skills system uses progressive disclosure so only relevant skills load per task, keeping token costs bounded even with a large library.

What models work best with the Hermes learning loop?

Models with strong TAU2 agentic benchmark scores work best — Qwen 3.5 27B (79%), Gemma 4 31B (76.9%), and GLM (>99% but GPU-only) are community favorites. More capable models execute complex tasks more reliably, generating better skill data.

Ready to Run Your Own AI Agent?

Self-host Hermes in 60 seconds. No credit card, no cloud lock-in.

Deploy Hermes Free →

Related Posts