Nous ResearchHermes Agent
Deploy Now

Subagents — Parallel AI Workstreams

Key Points

  • Isolated context
  • Parallel execution
  • Shared skill library
  • Resource management

How It Works

  1. 1Spawn: '/spawn research-bot'
  2. 2Each subagent is independent
  3. 3Share skills via common library
  4. 4Close when done

Real-World Use Cases

Parallel Research and Writing

Spawn three subagents: one researches the topic, one drafts the outline, one reviews existing content for gaps. The orchestrator collects all three outputs and synthesizes the final document — 3x faster than sequential work.

Concurrent API Integration Testing

Test multiple API endpoints simultaneously. Each subagent handles a different endpoint with isolated state, preventing test interference. Results aggregate back to a single summary report.

Multi-Region Deployment Verification

Deploy to multiple regions in parallel. Each subagent SSH's into a different server, runs the deployment, verifies it, and reports status. The orchestrator waits for all confirmations before marking deployment complete.

Data Pipeline Parallelization

Split large datasets across subagents for parallel processing. Zero-context-cost pipelines via execute_code mean subagents can crunch data without the context window overhead of a single large session.

Under the Hood

Subagents in Hermes are not threads or coroutines — they are fully isolated Hermes sessions with their own context windows, tool access, terminal backends, and Python RPC namespaces. This isolation prevents context leakage: a subagent working on database migrations can't accidentally see or modify the context of a subagent doing security audits. The orchestrator spawns them with a task specification and optional shared context, then receives structured results when they complete.

The spawn model supports both fire-and-forget (spawn and receive results asynchronously) and blocking (wait for all subagents before continuing). Programmatic Tool Calling collapses multi-step tool chains into single inference calls within a subagent, dramatically reducing latency for compute-intensive pipelines. Subagents share the parent's skill library by default, so skills created in the main session are immediately available in spawned workers.

Resource management is automatic: Hermes tracks active subagent count, enforces configurable concurrency limits, and handles graceful shutdown on timeout or error. Each subagent can use a different terminal backend — for example, the orchestrator runs locally while CPU-intensive subagents run on Modal (serverless GPU) or Daytona (serverless persistence). This heterogeneous compute model lets you right-size each workstream without manual infrastructure management.

Related Features