Hermes Agent's memory system is its most technically sophisticated feature. This deep dive covers exactly how each layer works, the file structures, the retrieval mechanisms, and what is happening under the hood.
Set up Memory first.
Architecture Overview
Three memory subsystems at different timescales:
Persistent Memory Layer 1: Frozen System Prompt Memory — always injected into every session
Layer 2: Episodic Memory (Skills) — markdown skill documents created from experience
Layer 3: Session Search — SQLite FTS5 full-text index of all conversations
Plus optional extensions: Honcho (cross-session user modeling) and PLUR (community engram plugin).
Setup Guide | Memory Wins first.
Layer 1: MEMORY.md + USER.md
File Locations
~/.hermes/memories/MEMORY.md # Agent's personal notes (~800 tokens)
~/.hermes/memories/USER.md # User profile (~500 tokens)
What Goes in MEMORY.md
MEMORY.md holds facts about the environment, conventions, and project-specific knowledge:
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
Don't use sudo for Docker — user is in the docker group
§
Project uses tabs, 120-char lines, Google docstrings
§
Migrated database from MySQL to PostgreSQL on 2026-01-15
§
Preferred image generation: fal.ai Nano Banana + Python overlay for logo watermark
The § separator is Hermes convention for separating memory entries.
What Goes in USER.md
USER.md holds communication style and working preferences:
Prefers concise responses — no unnecessary preamble
Works in TypeScript/React for frontend, Rust/Python for backend
Timezone: UTC+2
Ping on Discord for urgent tasks, Telegram for daily updates
Prefers Notion for project tracking
Capacity and Limits
| File | Char limit | Token equiv | Typical entries |
|---|---|---|---|
| MEMORY.md | 2,200 chars | ~800 tokens | 8–15 entries |
| USER.md | 1,375 chars | ~500 tokens | 5–10 entries |
Configure in ~/.hermes/config.yaml:
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200
user_char_limit: 1375
The Frozen Snapshot Pattern
Memory is injected once at session start as part of the system prompt — then frozen. It does not change mid-session.
This preserves the LLM prefix cache (significant cost saving with Claude/Anthropic), and it prevents mid-session rewrites from destabilizing the agent working context.
Changes made during a session (via the memory tool) are written to disk immediately and become visible in the next session.
Memory Tool Actions
# Add a new entry
memory(action="add", content="User prefers [DeepSeek](/vs/gpt4) V4 for quick tasks")
# Replace an existing entry
memory(action="replace",
old_text="Don't use sudo for Docker",
new_text="Don't use sudo for Docker — user is in docker group")
# Remove an entry
memory(action="remove", content="Migrated database from MySQL")
There is no read action — memory is auto-injected. Tool responses show live state after any change.
Security
Memory writes are scanned before persistence:
- Prompt injection patterns blocked
- Credential exfiltration patterns blocked
- SSH backdoor patterns blocked
- Invisible Unicode characters stripped
Layer 2: Episodic Memory (Skills System)
What Skills Are
Skills are on-demand knowledge documents in ~/.hermes/skills/ that the agent creates from experience and loads when needed.
This is procedural memory: not just what happened, but how to do it well, what to avoid, and how to verify success.
When Hermes Creates a Skill
The agent creates skills proactively when:
- It completed a complex task (5+ tool calls)
- It hit errors and found a working path
- The user corrected its approach
- It discovered a non-trivial workflow worth repeating
Every 15 tool calls, Hermes runs a self-evaluation checkpoint. If the work involved a reusable procedure, it creates or patches a skill.
SKILL.md Structure
---
name: image-generation-branded
description: Generate brand-consistent images with logo overlay
version: 1.0.0
platforms: [macos, linux]
metadata:
hermes:
tags: [image-generation, brand, fal-ai]
category: content
requires_toolsets: [terminal, code]
---
# Branded Image Generation
## When to Use
When generating images that need consistent brand identity with logo overlay.
## Procedure
1. Generate base image via fal.ai Nano Banana API
2. Save to /tmp/generated_TIMESTAMP.png
3. Apply Python overlay script for logo watermark
4. Verify: check output dimensions match spec (1200x628 for social)
## Pitfalls
- Nano Banana won't apply exact logo — always use Python overlay
- Logo opacity should be 70% to avoid overwhelming the image
- Save originals before overlay in case of rework
## Verification
Check output file exists and is >100KB
Progressive Disclosure
Skills use a three-level disclosure pattern:
Level 0: skills_list() → [{name, description, category}] (~3k tokens)
Level 1: skill_view(name) → Full SKILL.md content
Level 2: skill_view(name, path) → Specific reference file
The agent only loads full skill content when it actually needs it.
Community Skills
Installing skills from external sources:
hermes skills browse
hermes skills search kubernetes
hermes skills install openai/skills/k8s
hermes skills check
hermes skills publish --to github
Hub sources: Official (Nous), skills.sh, GitHub repos, ClawHub, Claude marketplace.
Layer 3: Session Search (SQLite FTS5)
How It Works
Every conversation is stored in ~/.hermes/state.db — SQLite with FTS5 indexing. There is no capacity limit.
When Hermes needs past context:
- Runs FTS5 search against session database
- Retrieves relevant conversation fragments
- Uses Gemini Flash to summarize relevant content
- Injects summary into current context
Browsing Sessions
hermes sessions list # Browse all past sessions
Sessions are searchable by content.
Memory vs Session Search
| Persistent Memory | Session Search | |
|---|---|---|
| Speed | Instant (always in prompt) | Requires search + LLM |
| Capacity | ~1,300 tokens | Unlimited |
| Content type | Curated key facts | Complete history |
| Access | Every session automatically | On-demand |
| Token cost | Fixed per session | On-demand |
Honcho: Cross-Session User Modeling
Honcho is optional AI-powered layer that builds a persistent model of who you are:
hermes honcho setup
In hybrid mode, MEMORY.md and USER.md remain intact — Honcho adds deeper modeling.
Community Extension: PLUR Engrams
PLUR (by plur9 on Discord) adds brain-inspired engram memory:
pip install plur-hermes
Features: Corrections become permanent knowledge, shared episodic memory across multiple agents, teams can share engrams on projects.
Memory Compression and Aging
When MEMORY.md approaches capacity, Hermes automatically:
- Identifies redundant entries
- Removes superseded entries
- Consolidates related entries into denser entries
The compression is agent-driven — the LLM decides what is important enough to keep.
FAQ
Can I manually edit memory files? Yes. They are plain text. Changes take effect on next session start.
Does memory sync across installations? Not by default. Use profiles export/import or sync manually.
How many sessions can the database hold? No hard limit. Users running for months report no performance issues with thousands of sessions.
Skill Discovery and Slash Commands
Every installed skill automatically becomes a slash command in the Hermes CLI and gateway:
/gif-search funny cats
/axolotl help me fine-tune Llama 3
/github-pr-workflow create a PR for the auth refactor
/plan design a rollout for migrating auth provider
/competitor-analysis-workflow
The agent can also discover and load skills mid-task without a slash command — it reads the skills list and pulls in relevant ones based on the current task context.
Platform-Specific Skills and Conditional Activation
Skills support platform restriction and conditional loading:
platforms: [macos, linux] # Auto-hidden on Windows
Conditional activation based on available toolsets:
fallback_for_toolsets: [web] # Only show when web toolset is unavailable
requires_toolsets: [terminal] # Only show when terminal is available
Example: the DuckDuckGo search skill only activates when the FIRECRAWL_API_KEY is missing — automatic graceful fallback without manual configuration.
External Skill Directories
Teams can share skills via external directories without merging them into ~/.hermes/skills/:
skills:
external_dirs:
- ~/.agents/skills
- /home/shared/team-skills
External dirs are read-only scanned. Local skills take precedence. Full integration — external skills get slash commands and system prompt entries like native skills.
Token Cost Implications of Skills
Community analysis (u/Witty_Ticket_4101, r/hermesagent) found the skills catalog is one of the largest token consumers in the system prompt — approximately 2.2K tokens for a typical skills list.
This is why progressive disclosure exists: only skill names and descriptions (~3K tokens total for 40+ skills) load into system prompt. Full skill content only loads on demand.
For power users with 40+ custom skills: consider organizing into categories and periodically archiving low-use skills. The token overhead at scale becomes meaningful.
Secure Skill Setup
Skills can declare required environment variables that Hermes will prompt for securely on first use:
required_environment_variables:
- name: TENOR_API_KEY
prompt: Tenor API key for GIF search
help: Get a key from developers.google.com/tenor
Hermes prompts for this in CLI only — messaging surfaces (Telegram, Discord) never ask for secrets in chat. The key is stored in your local config, not in the skill file.
The Skill Audit and Security Scan
When installing community skills, Hermes runs a security scan:
hermes skills install openai/skills/k8s # Scans before install
hermes skills audit # Re-scan installed skills
Quarantined skills go to .hub/quarantine/ with a reason. Audit log maintained at .hub/audit.log.
Cross-Agent Skill Sharing with PLUR
The community PLUR plugin extends skills to support shared episodic memory across multiple Hermes instances:
pip install plur-hermes
When multiple agents share engrams, corrections made by one agent propagate as knowledge to others on the same project. From the Discord showcase: "Shared episodic memory across 6 agents is genuinely powerful and would be hard to replicate any other way."
This makes Hermes viable for small teams where multiple people run separate instances but want shared institutional knowledge.