OpenClaw AaaS
Managed service for running template-based OpenClaw agents with:
- tenant onboarding
- queued run orchestration
- BYOK model execution via Cloudflare AI Gateway
- subscription-gated access using Stripe webhooks
This repository is a monorepo with a web control plane and a Cloudflare Worker API.
Product Overview
OpenClaw AaaS is designed for teams that want to ship agent workflows without building platform infrastructure first.
Core idea:
- Customer creates a tenant.
- Customer selects an agent template.
- Customer launches a run (optionally with BYOK model key).
- Backend validates subscription, queues execution, updates run lifecycle, and coordinates agent execution.
Repository Layout
apps/web: Next.js UI (landing, signup, dashboard, API proxy routes)apps/worker-api: Cloudflare Worker backend (tenants, templates, runs, subscriptions, queue consumer)docs/security: BYOK and baseline security policiesdocs/runbooks: operational runbooksdocs/plans: implementation plans and architecture notes
Architecture
Frontend (apps/web):
- Serves UI pages.
- Proxies backend calls through
pages/api/control/*. - Uses
WORKER_API_BASE_URLserver-side to reach Worker API.
Backend (apps/worker-api):
fetchroutes:/healthz/api/tenants/api/templates/api/runs/api/auth/*/api/telegram/*/api/webhooks/stripe/api/subscriptions/api/runtime/skillsqueueconsumer processes run messages and updates D1 run status.- Durable Object
AgentSessioncoordinates per-agent execution sessions. - Storage/bindings: D1 (
DB), R2 (ARTIFACTS), Queue (RUN_QUEUE), Durable Object (AGENT_SESSION).
Cloudflare Mapping
This is how OpenClaw Autopilot components map to Cloudflare infrastructure:
| Product Component | Cloudflare Primitive | Purpose |
|---|---|---|
API layer (apps/worker-api) | Workers | Handles auth, tenant config, Telegram pairing, runs, and webhook endpoints |
| Persistent app data | D1 (DB) | Stores tenants, accounts/sessions, pairing state, runs, and subscription state |
| Async run execution | Queues (RUN_QUEUE) | Decouples request/response from run processing and retries |
| Agent runtime coordination | Durable Objects (AGENT_SESSION) | Keeps per-agent execution/session state consistent |
| Artifacts/output storage | R2 (ARTIFACTS) | Stores run artifacts and generated output blobs |
| Model provider abstraction | AI Gateway | Routes BYOK model calls with provider-agnostic observability |
| Billing events ingress | Worker webhook (/api/webhooks/stripe) | Receives Stripe subscription lifecycle updates |
| Telegram ingress | Worker webhook (/api/telegram/webhook) | Receives chat updates and forwards to tenant-scoped execution |
High-level flow:
- User uses
apps/webUI atautoclaw.sh(or your own deployment). - Next.js API routes proxy requests to Worker API (
WORKER_API_BASE_URL). - Worker validates auth/session and writes state to D1.
- Long-running work is enqueued to
RUN_QUEUE. - Queue consumer invokes
AgentSessionand model calls (AI Gateway/BYOK), then persists run status/results.
Shared Telegram + Tenant Runtime Architecture
The product uses one shared Telegram bot. Tenant isolation is handled by pairing each Telegram user to a tenant, then routing requests into that tenant’s OpenClaw runtime workspace.
flowchart LR
U[User in Telegram] --> TG[Telegram Platform]
TG --> WB["Worker API: /api/webhooks/telegram"]
WB --> PR[(D1: telegram_pairings)]
PR -->|"telegram_user_id -> tenant_id"| TR[Tenant Router]
TR --> TT[(D1: tenants)]
TR --> RT[(D1: tenant_openclaw_runtime)]
TR --> WS[(R2: tenant/<id>/openclaw/workspace/*)]
TR --> SB["Cloudflare Sandbox DO: tenant:<tenant_id>"]
SB --> CT["Cloudflare Container\nOpenClaw Gateway + Workspace (/root/clawd)"]
CT --> WB
WB --> TG
TG --> U
Runtime lifecycle per tenant:
- Telegram message hits
/api/webhooks/telegram. - Pairing lookup resolves
telegram_user_idtotenant_id. - Runtime state is checked in
tenant_openclaw_runtime. - If needed, Worker bootstraps workspace files from R2 and starts tenant sandbox/container.
- Request is routed to tenant OpenClaw gateway process.
- Reply is sent back through Telegram.
Runtime Skills Inventory + Policy
The backend now exposes tenant-scoped runtime skills from the actual OpenClaw sandbox:
GET /api/runtime/skills?tenantId=<id>- optional
includeHidden=true GET /api/runtime/skills/packsfor curated pack definitionsPOST /api/runtime/skills/packsto apply a pack for a tenantGET /api/runtime/skills/diagnostics?tenantId=<id>for requirement diagnosticsPOST /api/runtime/skills/remediatefor one-click policy remediation
This endpoint:
- ensures tenant runtime bootstrap/startup,
- syncs tenant-enabled custom skills from D1/R2 into runtime skill folders (
/root/.openclaw/skills/tenant-<id>-*), - runs
openclaw skills list --jsoninside tenant runtime, - merges per-tenant policy from D1 (
tenant_runtime_skill_policy), - returns
effectiveReadyfor each skill.
Policy updates are supported with:
PATCH /api/runtime/skills
Body:
{
"tenantId": "t_123",
"policies": [
{ "name": "weather", "allowed": true, "enabled": true, "hidden": false }
]
}
Policy semantics:
allowed=false: explicitly disallow skill for tenant policy.enabled=false: keep skill installed/visible in policy but disabled for tenant usage.hidden=true: hide skill from default skills list responses unlessincludeHidden=true.
Diagnostics/remediation:
GET /api/runtime/skills/diagnosticssummarizes unavailable skills and missing requirements (bins,env,config,os).POST /api/runtime/skills/remediatesupports: :strategy=hide_unavailable(default) -> disable/hide unavailable skills in policy. :strategy=enable_ready-> explicitly enable/unhide ready skills.
Telegram command:
- Send
/skillsto the shared Telegram bot to see tenant-enabled skills with readiness markers (✅ready,⚠️unavailable + short reason).
Pack semantics:
basic: starter setup focused on safe day-to-day capability.creator: research/content oriented setup.ops: technical operations oriented setup.- New tenants get
basicautomatically on first runtime bootstrap (only when no policy exists).
Request Flow (Happy Path)
- UI posts
POST /api/control/runs. - Next.js API route proxies to Worker
POST /api/runs. - Worker checks tenant subscription status.
- Worker writes queued run in D1 and enqueues run payload.
- Queue consumer marks run running, optionally calls AI Gateway using BYOK key, invokes
AgentSession, then marks run succeeded/failed.
Prerequisites
- Node.js 22.x (tested with v22.22.0)
- npm 9+
- Cloudflare account for Worker development/deploy
- Wrangler CLI (installed via workspace dependencies)
Getting Started (Developers)
1. Install dependencies
npm install
2. Run web UI locally
npm run dev:web
Open http://localhost:3000.
Optional web env vars:
WORKER_API_BASE_URL: Worker API base URL used by server-side proxy routes.NEXT_PUBLIC_STRIPE_STARTER_URL: Starter checkout URL used by signup flow.
3. Run Worker API locally
npm run dev:worker
Wrangler config lives at apps/worker-api/wrangler.jsonc.
4. Run backend checks
npm run test:worker
npm run typecheck:worker
5. Build web app
npm run build:web
If Backend Is Already Deployed
You can develop only the UI and point it to deployed backend:
- Set
WORKER_API_BASE_URLto deployed Worker URL. - Run
npm run dev:web. - Use UI pages (
/signup,/dashboard,/pricing) against live backend.
Deployment Notes
Frontend:
- Deploy
apps/webas a Next.js server app (for API routes), e.g. Vercel. - Configure
WORKER_API_BASE_URLin deployment env.
Backend:
- Deploy Worker from
apps/worker-apiwith Wrangler:
npm run deploy -w apps/worker-api
- Ensure Cloudflare bindings and secrets are configured:
- D1 database
- R2 bucket
- Queue producer/consumer
- Durable Object migration
STRIPE_WEBHOOK_SECRETCF_ACCOUNT_IDAI_GATEWAY_ID
Security and BYOK
Policy docs:
docs/security/byok-policy.mddocs/security/cloudflare-baseline.md
Current policy intent:
- BYOK keys are customer-supplied and not persisted in D1.
- Keys are used for outbound model calls only.
- Redaction and operational hardening are required before GA.