How-To Guide

Configure Hugging Face Inference for Hermes

Configure Hermes Agent to use 20+ open models via Hugging Face's unified inference endpoint.

Quick answer

Configure Hugging Face Inference to reach 20+ open models through one unified endpoint with no markup on provider rates and a free tier ($0.10/month). Add your HF token in config — it's the easiest way to try several open models without juggling separate provider accounts.

Hugging Face Inference provides access to 20+ open models through a unified endpoint. Free tier included ($0.10/month), no markup on provider rates. Great for trying different open models without multiple accounts.

Deploy Hermes faster with FlyHermes

Managed cloud · API costs included · Skill library · Cancel anytime

Before you start:

☑Hermes Agent installed
☑Hugging Face account
☑HF token with Inference Providers permission

Steps

1
Get your token
Go to huggingface.co/settings/tokens and create a token with 'Make calls to Inference Providers' permission
2
Add to .env
Add HF_TOKEN=hf_xxx to ~/.hermes/.env
3
Set the provider
In config.yaml, set model: provider: huggingface
4
Choose a model
Pick from available open models like Llama, Mistral, Qwen

Pro Tips

💡Free tier includes basic usage — enough for experimentation
💡No markup on underlying provider rates
💡Access to the latest open models as they're released
💡Good alternative to running local models without dedicated hardware

Troubleshooting

❌ Permission denied

✅ Ensure your token has 'Make calls to Inference Providers' permission. Regenerate if needed.

❌ Model not available

✅ Not all HF models are available via inference. Check the model card for inference availability.

FAQ

Why use Hugging Face Inference with Hermes?

One endpoint reaches 20+ open models with no markup on provider rates, plus a free tier ($0.10/month). It's the simplest way to test many open models without separate accounts.

How do I configure it?

Add your Hugging Face token in Hermes config and select a model from the available open models. The unified endpoint handles routing.

Is the free tier enough to evaluate?

It's enough to try models and compare them. For sustained agent workloads you'll move to paid usage, but there's no per-model markup on top of provider rates.

Configure Hugging Face Inference for Hermes

Before you start:

Steps

Get your token

Add to .env

Set the provider

Choose a model

Pro Tips

Troubleshooting

FAQ

Why use Hugging Face Inference with Hermes?

How do I configure it?

Is the free tier enough to evaluate?

Related Guides