Nous ResearchHermes Agent
Deploy Now

Learning Loop — Self-Improving AI Skills

Key Points

  • Observes multi-step tasks
  • Creates skill after 3+ attempts
  • Refines skill from feedback
  • Improves on your specific workflow

How It Works

  1. 1Give Hermes a complex task
  2. 2After 3+ attempts, skill auto-created
  3. 3Edit skill in ~/.hermes/skills/
  4. 4Skill improves with each use

Real-World Use Cases

Automated Dev Workflow Skills

After you walk Hermes through your PR review process three times, it creates a skill: check CI status, summarize diff, flag style violations, post review comments. Next time, it runs the full workflow from a single command.

Domain-Specific Research Patterns

Teach Hermes your research methodology once — which sources to check, how to structure findings, what to cite. It creates a research skill and applies it consistently every time you ask for a deep dive.

Error Recovery Memory

When Hermes hits a dead end and finds a workaround, it patches its own skill to avoid that dead end next time. Failed approaches are remembered and pruned; successful paths are reinforced.

Community Skill Sharing

Skills you create locally can be published to the Skills Hub (agentskills.io, ClawHub, GitHub). Install community skills from OpenAI, Anthropic, or independent contributors with a single hermes skills install command.

Under the Hood

The learning loop follows an observe → distill → reuse → refine cycle. During the observe phase, Hermes tracks multi-step tasks in its episodic memory layer — every tool call, every decision branch, every correction you make. After 3+ successful completions of a similar task pattern, it enters the distill phase: generating a SKILL.md document capturing the procedure, pitfalls, and verification steps in a structured format compatible with the agentskills.io open standard.

Skills live at ~/.hermes/skills/ and are immediately available as slash commands. The agent uses progressive disclosure when loading skills — it sees a lightweight description (name, description, category, ~3k tokens total for all skills) and only loads the full SKILL.md when the task actually matches. This keeps token overhead near zero for irrelevant skills while preserving full richness for active ones. Hermes can also patch its own skills mid-session using the skill_manage tool, updating procedures when it discovers a better approach without requiring a full rewrite.

For deeper self-improvement, Hermes integrates with the Atropos RL pipeline — a reinforcement learning framework that trains on interaction trajectories. You can rate responses, mark corrections, or let auto-evaluation run. The pipeline supports RLHF and DPO, and exports trajectories in ShareGPT format for fine-tuning custom models. This means the learning loop isn't just skill-level procedural memory — it can improve the underlying model's behavior on your specific use cases over time.

Related Features