Feature

Learning Loop — Self-Improving AI Skills

Quick answer

Hermes' learning loop turns repeated work into reusable skills: it observes multi-step tasks, creates a skill after you've done something a few times, and refines it from feedback. The result is an agent that gets more capable on your specific workflows the longer you use it.

Key Points

✓Observes multi-step tasks
✓Creates skill after 3+ attempts
✓Refines skill from feedback
✓Improves on your specific workflow

How It Works

1Give Hermes a complex task
2After 3+ attempts, skill auto-created
3Edit skill in ~/.hermes/skills/
4Skill improves with each use

Real-World Use Cases

Automated Dev Workflow Skills

After you walk Hermes through your PR review process three times, it creates a skill: check CI status, summarize diff, flag style violations, post review comments. Next time, it runs the full workflow from a single command.

Domain-Specific Research Patterns

Teach Hermes your research methodology once — which sources to check, how to structure findings, what to cite. It creates a research skill and applies it consistently every time you ask for a deep dive.

Error Recovery Memory

When Hermes hits a dead end and finds a workaround, it patches its own skill to avoid that dead end next time. Failed approaches are remembered and pruned; successful paths are reinforced.

Community Skill Sharing

Skills you create locally can be published to the Skills Hub (agentskills.io, ClawHub, GitHub). Install community skills from OpenAI, Anthropic, or independent contributors with a single hermes skills install command.

Under the Hood

The learning loop follows an observe → distill → reuse → refine cycle. During the observe phase, Hermes tracks multi-step tasks in its episodic memory layer — every tool call, every decision branch, every correction you make. After 3+ successful completions of a similar task pattern, it enters the distill phase: generating a SKILL.md document capturing the procedure, pitfalls, and verification steps in a structured format compatible with the agentskills.io open standard.

Skills live at ~/.hermes/skills/ and are immediately available as slash commands. The agent uses progressive disclosure when loading skills — it sees a lightweight description (name, description, category, ~3k tokens total for all skills) and only loads the full SKILL.md when the task actually matches. This keeps token overhead near zero for irrelevant skills while preserving full richness for active ones. Hermes can also patch its own skills mid-session using the skill_manage tool, updating procedures when it discovers a better approach without requiring a full rewrite.

For deeper self-improvement, Hermes integrates with the Atropos RL pipeline — a reinforcement learning framework that trains on interaction trajectories. You can rate responses, mark corrections, or let auto-evaluation run. The pipeline supports RLHF and DPO, and exports trajectories in ShareGPT format for fine-tuning custom models. This means the learning loop isn't just skill-level procedural memory — it can improve the underlying model's behavior on your specific use cases over time.

Frequently asked questions

How does Hermes' learning loop work?

It observes how you handle multi-step tasks, and after several repetitions it creates a reusable skill, then refines that skill from your feedback. Over time the agent gets better at your specific workflows.

Does the learning loop rewrite Hermes' own code?

No. It builds and improves skills — reusable instruction sets the agent loads when relevant — rather than modifying the core agent. That's what makes the improvement safe and inspectable.

Where do learned skills live?

Under ~/.hermes/skills, as inspectable units. You can review, edit, or share them, so the agent's growing capability stays transparent and portable.

Related Features

persistent memory skill marketplace reinforcement learning

Try Hermes Free → Deploy in 60 seconds