Subagents — Parallel AI Workstreams
Key Points
- ✓Isolated context
- ✓Parallel execution
- ✓Shared skill library
- ✓Resource management
How It Works
- 1Spawn: '/spawn research-bot'
- 2Each subagent is independent
- 3Share skills via common library
- 4Close when done
Real-World Use Cases
Parallel Research and Writing
Spawn three subagents: one researches the topic, one drafts the outline, one reviews existing content for gaps. The orchestrator collects all three outputs and synthesizes the final document — 3x faster than sequential work.
Concurrent API Integration Testing
Test multiple API endpoints simultaneously. Each subagent handles a different endpoint with isolated state, preventing test interference. Results aggregate back to a single summary report.
Multi-Region Deployment Verification
Deploy to multiple regions in parallel. Each subagent SSH's into a different server, runs the deployment, verifies it, and reports status. The orchestrator waits for all confirmations before marking deployment complete.
Data Pipeline Parallelization
Split large datasets across subagents for parallel processing. Zero-context-cost pipelines via execute_code mean subagents can crunch data without the context window overhead of a single large session.
Under the Hood
Subagents in Hermes are not threads or coroutines — they are fully isolated Hermes sessions with their own context windows, tool access, terminal backends, and Python RPC namespaces. This isolation prevents context leakage: a subagent working on database migrations can't accidentally see or modify the context of a subagent doing security audits. The orchestrator spawns them with a task specification and optional shared context, then receives structured results when they complete.
The spawn model supports both fire-and-forget (spawn and receive results asynchronously) and blocking (wait for all subagents before continuing). Programmatic Tool Calling collapses multi-step tool chains into single inference calls within a subagent, dramatically reducing latency for compute-intensive pipelines. Subagents share the parent's skill library by default, so skills created in the main session are immediately available in spawned workers.
Resource management is automatic: Hermes tracks active subagent count, enforces configurable concurrency limits, and handles graceful shutdown on timeout or error. Each subagent can use a different terminal backend — for example, the orchestrator runs locally while CPU-intensive subagents run on Modal (serverless GPU) or Daytona (serverless persistence). This heterogeneous compute model lets you right-size each workstream without manual infrastructure management.