How-To Guide

Run Hermes on Modal — Serverless Cloud Compute

Configure Hermes Agent to execute commands on Modal's serverless cloud infrastructure — GPU access, auto-scaling, pay only when active.

Quick answer

The Modal terminal backend runs Hermes commands on serverless cloud VMs with GPU access, auto-scaling, and hibernation — costs fall to near-zero when idle. It suits AI workloads and batch processing where you want elastic compute without keeping a server running. Configure it with your Modal credentials.

Modal backend runs Hermes commands on serverless cloud VMs. You get GPU access, auto-scaling, and hibernation between sessions — costs drop to near-zero when idle. Perfect for AI workloads, batch processing, or when you need more compute than your laptop provides.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

☑Hermes Agent installed
☑Modal account (free tier available)
☑Modal CLI authenticated

Steps

1
Install Modal
pip install modal
2
Authenticate with Modal
Run modal setup — this opens a browser for OAuth authentication
3
Configure the backend
In config.yaml, set terminal: backend: modal
4
Choose a container image
Set terminal: modal_image: nikolaik/python-nodejs:python3.11-nodejs20
5
Set resource limits
Configure container_cpu, container_memory, and container_disk
6
Start using Hermes
Commands now execute on Modal's cloud infrastructure

Pro Tips

💡Modal hibernates containers between sessions — you only pay for active compute time
💡Request GPU access on Modal's dashboard for CUDA-accelerated workloads
💡Container images are cached — first run is slow, subsequent runs are fast
💡Modal's free tier includes $30/month of compute credits

Troubleshooting

❌ Modal not authenticated

✅ Run 'modal setup' to authenticate via browser. Credentials are stored in ~/.modal/

❌ Container fails to start

✅ Check Modal's dashboard for error logs. Common issue: the Docker image isn't compatible with Modal's runtime.

❌ High latency on first command

✅ First command provisions a new container (cold start). Use container_persistent: true to keep the container warm between sessions.

❌ GPU not available

✅ GPU access requires Modal approval. Request it from your Modal dashboard settings.

FAQ

What does the Modal backend give me?

Serverless cloud execution: GPU access, auto-scaling, and hibernation between sessions, so you pay near-zero when idle and scale up only when the agent runs work.

When should I use Modal over a fixed server?

For bursty AI workloads and batch jobs where a always-on VPS would be wasteful. Modal spins compute up on demand and hibernates it after.

Does Modal replace the model provider?

No. Modal runs the commands/compute; you still configure a model provider for the agent's reasoning. They're separate layers.

Run Hermes on Modal — Serverless Cloud Compute

Before you start:

Steps

Install Modal

Authenticate with Modal

Configure the backend

Choose a container image

Set resource limits

Start using Hermes

Pro Tips

Troubleshooting

FAQ

What does the Modal backend give me?

When should I use Modal over a fixed server?

Does Modal replace the model provider?

Related Guides