Nous ResearchHermes Agent
Deploy Now

Inside Hermes' Three-Layer Memory

·hermes agent memory systemmemorytechnicalarchitecture

How Hermes Agent's three-layer memory system actually works — MEMORY.md, skills, and session DB explained without the fluff.

Want to try Hermes Agent yourself?

Try Hermes Free → Deploy in 60 seconds

Hermes Agent's memory system is its most technically sophisticated feature. This deep dive covers exactly how each layer works, the file structures, the retrieval mechanisms, and what is happening under the hood.

Set up Memory first.

Architecture Overview

Three memory subsystems at different timescales:

Persistent Memory Layer 1: Frozen System Prompt Memory — always injected into every session
Layer 2: Episodic Memory (Skills) — markdown skill documents created from experience
Layer 3: Session Search — SQLite FTS5 full-text index of all conversations

Plus optional extensions: Honcho (cross-session user modeling) and PLUR (community engram plugin).

Setup Guide | Memory Wins first.

Layer 1: MEMORY.md + USER.md

File Locations

~/.hermes/memories/MEMORY.md   # Agent's personal notes (~800 tokens)
~/.hermes/memories/USER.md     # User profile (~500 tokens)

What Goes in MEMORY.md

MEMORY.md holds facts about the environment, conventions, and project-specific knowledge:

User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
Don't use sudo for Docker — user is in the docker group
§
Project uses tabs, 120-char lines, Google docstrings
§
Migrated database from MySQL to PostgreSQL on 2026-01-15
§
Preferred image generation: fal.ai Nano Banana + Python overlay for logo watermark

The § separator is Hermes convention for separating memory entries.

What Goes in USER.md

USER.md holds communication style and working preferences:

Prefers concise responses — no unnecessary preamble
Works in TypeScript/React for frontend, Rust/Python for backend
Timezone: UTC+2
Ping on Discord for urgent tasks, Telegram for daily updates
Prefers Notion for project tracking

Capacity and Limits

File Char limit Token equiv Typical entries
MEMORY.md 2,200 chars ~800 tokens 8–15 entries
USER.md 1,375 chars ~500 tokens 5–10 entries

Configure in ~/.hermes/config.yaml:

memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 2200
  user_char_limit: 1375

The Frozen Snapshot Pattern

Memory is injected once at session start as part of the system prompt — then frozen. It does not change mid-session.

This preserves the LLM prefix cache (significant cost saving with Claude/Anthropic), and it prevents mid-session rewrites from destabilizing the agent working context.

Changes made during a session (via the memory tool) are written to disk immediately and become visible in the next session.

Memory Tool Actions

# Add a new entry
memory(action="add", content="User prefers [DeepSeek](/vs/gpt4) V4 for quick tasks")

# Replace an existing entry
memory(action="replace", 
       old_text="Don't use sudo for Docker",
       new_text="Don't use sudo for Docker — user is in docker group")

# Remove an entry
memory(action="remove", content="Migrated database from MySQL")

There is no read action — memory is auto-injected. Tool responses show live state after any change.

Security

Memory writes are scanned before persistence:

  • Prompt injection patterns blocked
  • Credential exfiltration patterns blocked
  • SSH backdoor patterns blocked
  • Invisible Unicode characters stripped

Layer 2: Episodic Memory (Skills System)

What Skills Are

Skills are on-demand knowledge documents in ~/.hermes/skills/ that the agent creates from experience and loads when needed.

This is procedural memory: not just what happened, but how to do it well, what to avoid, and how to verify success.

When Hermes Creates a Skill

The agent creates skills proactively when:

  • It completed a complex task (5+ tool calls)
  • It hit errors and found a working path
  • The user corrected its approach
  • It discovered a non-trivial workflow worth repeating

Every 15 tool calls, Hermes runs a self-evaluation checkpoint. If the work involved a reusable procedure, it creates or patches a skill.

SKILL.md Structure

---
name: image-generation-branded
description: Generate brand-consistent images with logo overlay
version: 1.0.0
platforms: [macos, linux]
metadata:
  hermes:
    tags: [image-generation, brand, fal-ai]
    category: content
    requires_toolsets: [terminal, code]
---
# Branded Image Generation

## When to Use
When generating images that need consistent brand identity with logo overlay.

## Procedure
1. Generate base image via fal.ai Nano Banana API
2. Save to /tmp/generated_TIMESTAMP.png
3. Apply Python overlay script for logo watermark
4. Verify: check output dimensions match spec (1200x628 for social)

## Pitfalls
- Nano Banana won't apply exact logo — always use Python overlay
- Logo opacity should be 70% to avoid overwhelming the image
- Save originals before overlay in case of rework

## Verification
Check output file exists and is >100KB

Progressive Disclosure

Skills use a three-level disclosure pattern:

Level 0: skills_list() → [{name, description, category}] (~3k tokens)
Level 1: skill_view(name) → Full SKILL.md content
Level 2: skill_view(name, path) → Specific reference file

The agent only loads full skill content when it actually needs it.

Community Skills

Installing skills from external sources:

hermes skills browse                        
hermes skills search kubernetes         
hermes skills install openai/skills/k8s  
hermes skills check                     
hermes skills publish --to github      

Hub sources: Official (Nous), skills.sh, GitHub repos, ClawHub, Claude marketplace.

Layer 3: Session Search (SQLite FTS5)

How It Works

Every conversation is stored in ~/.hermes/state.db — SQLite with FTS5 indexing. There is no capacity limit.

When Hermes needs past context:

  1. Runs FTS5 search against session database
  2. Retrieves relevant conversation fragments
  3. Uses Gemini Flash to summarize relevant content
  4. Injects summary into current context

Browsing Sessions

hermes sessions list    # Browse all past sessions

Sessions are searchable by content.

Memory vs Session Search

Persistent Memory Session Search
Speed Instant (always in prompt) Requires search + LLM
Capacity ~1,300 tokens Unlimited
Content type Curated key facts Complete history
Access Every session automatically On-demand
Token cost Fixed per session On-demand

Honcho: Cross-Session User Modeling

Honcho is optional AI-powered layer that builds a persistent model of who you are:

hermes honcho setup

In hybrid mode, MEMORY.md and USER.md remain intact — Honcho adds deeper modeling.

Community Extension: PLUR Engrams

PLUR (by plur9 on Discord) adds brain-inspired engram memory:

pip install plur-hermes

Features: Corrections become permanent knowledge, shared episodic memory across multiple agents, teams can share engrams on projects.

Memory Compression and Aging

When MEMORY.md approaches capacity, Hermes automatically:

  1. Identifies redundant entries
  2. Removes superseded entries
  3. Consolidates related entries into denser entries

The compression is agent-driven — the LLM decides what is important enough to keep.

FAQ

Can I manually edit memory files? Yes. They are plain text. Changes take effect on next session start.

Does memory sync across installations? Not by default. Use profiles export/import or sync manually.

How many sessions can the database hold? No hard limit. Users running for months report no performance issues with thousands of sessions.

Skill Discovery and Slash Commands

Every installed skill automatically becomes a slash command in the Hermes CLI and gateway:

/gif-search funny cats
/axolotl help me fine-tune Llama 3
/github-pr-workflow create a PR for the auth refactor
/plan design a rollout for migrating auth provider
/competitor-analysis-workflow

The agent can also discover and load skills mid-task without a slash command — it reads the skills list and pulls in relevant ones based on the current task context.

Platform-Specific Skills and Conditional Activation

Skills support platform restriction and conditional loading:

platforms: [macos, linux]  # Auto-hidden on Windows

Conditional activation based on available toolsets:

fallback_for_toolsets: [web]   # Only show when web toolset is unavailable
requires_toolsets: [terminal]  # Only show when terminal is available

Example: the DuckDuckGo search skill only activates when the FIRECRAWL_API_KEY is missing — automatic graceful fallback without manual configuration.

External Skill Directories

Teams can share skills via external directories without merging them into ~/.hermes/skills/:

skills:
  external_dirs:
    - ~/.agents/skills
    - /home/shared/team-skills

External dirs are read-only scanned. Local skills take precedence. Full integration — external skills get slash commands and system prompt entries like native skills.

Token Cost Implications of Skills

Community analysis (u/Witty_Ticket_4101, r/hermesagent) found the skills catalog is one of the largest token consumers in the system prompt — approximately 2.2K tokens for a typical skills list.

This is why progressive disclosure exists: only skill names and descriptions (~3K tokens total for 40+ skills) load into system prompt. Full skill content only loads on demand.

For power users with 40+ custom skills: consider organizing into categories and periodically archiving low-use skills. The token overhead at scale becomes meaningful.

Secure Skill Setup

Skills can declare required environment variables that Hermes will prompt for securely on first use:

required_environment_variables:
  - name: TENOR_API_KEY
    prompt: Tenor API key for GIF search
    help: Get a key from developers.google.com/tenor

Hermes prompts for this in CLI only — messaging surfaces (Telegram, Discord) never ask for secrets in chat. The key is stored in your local config, not in the skill file.

The Skill Audit and Security Scan

When installing community skills, Hermes runs a security scan:

hermes skills install openai/skills/k8s   # Scans before install
hermes skills audit                        # Re-scan installed skills

Quarantined skills go to .hub/quarantine/ with a reason. Audit log maintained at .hub/audit.log.

Cross-Agent Skill Sharing with PLUR

The community PLUR plugin extends skills to support shared episodic memory across multiple Hermes instances:

pip install plur-hermes

When multiple agents share engrams, corrections made by one agent propagate as knowledge to others on the same project. From the Discord showcase: "Shared episodic memory across 6 agents is genuinely powerful and would be hard to replicate any other way."

This makes Hermes viable for small teams where multiple people run separate instances but want shared institutional knowledge.

Frequently Asked Questions

What is the difference between MEMORY.md and USER.md?

MEMORY.md holds facts about your environment and projects — your stack, conventions, server setup, what failed before. USER.md captures your communication style and preferences — how you like to be addressed, your response length preference, which tools you prefer. They're separate by design.

How does Hermes decide what to remember?

Hermes evaluates every session for facts worth persisting — environment configurations, user corrections, successful workflows, and conventions that produced good results. Entries are marked with the section separator and consolidated automatically when the file approaches its 2,200-character limit.

Can I search through my conversation history with Hermes?

Yes. All conversations are stored in ~/.hermes/state.db with SQLite FTS5 full-text search. When Hermes needs context from a past session, it searches the database and uses Gemini Flash to summarize relevant conversations before injecting them into the current context.

What is progressive disclosure in the skills system?

Skills use three loading levels: Level 0 loads only skill names and descriptions (~3K tokens for 40+ skills), Level 1 loads full SKILL.md content on demand, and Level 2 loads specific reference files. The agent only pulls what it needs, when it needs it.

How does the PLUR engram system differ from Hermes built-in memory?

Built-in memory saves discrete facts and procedures. PLUR captures the weight and context of experiences — corrections become permanent engrams that propagate across multiple agents on the same project. Teams sharing engrams can teach one agent and have others learn automatically.

Ready to Run Your Own AI Agent?

Self-host Hermes in 60 seconds. No credit card, no cloud lock-in.

Deploy Hermes Free →

Related Posts