Table of Contents
- Introduction
- The Problem With AI Agents Today
- What Is Web Agent
- Architecture: Under the Hood
- Core Features
- Getting Started in 60 Seconds
- Self-Learning: The Agent That Gets Smarter Every Conversation
- Comparison: Web Agent vs The Alternatives
- Real-World Usage: How We Use It
- Open Source & Community
- Roadmap: What's Next
- Conclusion
Introduction
!Web Agent in-browser architecture: self-learning loop with DOM interaction and tool execution
If you've ever tried to use an AI agent, you're probably familiar with the friction: install this Python stack, configure that API key, set up a virtual environment, debug the Docker container, remember to restart the server, and then pray your environment variables don't get lost between reboots.
What if you could use a full-featured AI agent — skills, memory, tools, automation — without installing anything at all?
Today, we're open-sourcing Web Agent (codename) / Web Agent — a production-ready AI agent that runs entirely in your browser, powered by WebContainers, with zero local setup required. No Python. No Node.js installation. No server. No command line required. Just open the browser, create a profile, set your API key, and start working.
Built on the same architecture as Hermes Agent (our desktop AI assistant), Web Agent brings the full power of autonomous AI workflows to any modern browser — isolated profiles, persistent memory, a knowledge vault, slash commands, cron automation, and multi-platform gateways — while keeping all your data locally encrypted and never sending it to our servers.
This is the AI agent I've been using internally at aratech to automate our Directus blog workflows, research tasks, and knowledge management. Now it's yours, under the MIT license, at github.com/nikola66/web-agent and live at webagent.aratech.ae.
The Problem With AI Agents Today
Let's be honest: AI agents today are powerful, but they're also painful to use.
The standard agent setup is something like this:
- Install a local runtime — Python virtual environment, node modules, Docker images, Ollama pulls
- Configure your environment — API keys, proxy settings, SSL certificates, system variables
- Build your pipeline - glue scripts, framework setup, vector database configuration
- Hope it still works tomorrow — your OS update breaks the Python binary, a dependency changes, your local LLM crashes
This friction is why agents haven't reached mainstream adoption. The technology is ready, but the delivery mechanism is stuck in the same complexity trap that original web development was in before cloud platforms abstracted it away.
There's also the state problem. Most agents conflate your data, your conversations, your tasks, and your credentials into a single blob — or worse, require you to trust a third party with all of it. If their server goes down, your agent goes down. If they change their API, your workflow breaks. If they decide to stop offering a free tier, you're locked out.
Finally, there's the specialization gap. The average AI is trained on the entire internet — that's like having a thousand employees, none of whom know anything about your business. You spend half an hour re-explaining your context, your rules, and your goals every time you start a new conversation. That's not a knowledge worker; that's a repetitive onboarding process.
We built Web Agent to solve all three problems at once.
What Is Web Agent?
Web Agent (our internal codename; the project is formally called Web Agent) is a full-function AI agent that runs natively in your browser using WebContainers — the same technology that powers CodeSandbox and StackBlitz.
Think of it as Hermes Agent, but ported to the browser. Same skills system, same multi-layer memory, same ~40 built-in tools, same self-learning loop. The difference is: no installation, no server, no environment variables, no Docker.
| Typical AI Agent Setup | Web Agent |
|-------------------------------|------------------------------|
| Install Python/Node/Docker | Open browser |
| Configure .env file | Set API key (encrypted local)|
| Choose vector DB | Zero config |
| Maintain server uptime | Works immediately |
| Data leaves your machine | Everything stays in browser |
| Single agent per install | 4 isolated profiles at once |Every profile in Web Agent gets its own:
- Isolated workspace — files, shells, and project state sandboxed from other profiles
- Separate memory — fact store, session memory, reflections, and learnings scoped per profile
- Encrypted credentials — API keys stored locally in the browser, never transmitted to servers
- Skill overrides — per-profile skill definitions that inherit from a shared base
If you profile up for personal use, one for client work, one for open-source contribution, and one for experiments — they each live in their own world, completely isolated.
Architecture: Under the Hood
Web Agent's architecture is deliberately layered to keep execution, persistence, and infrastructure as separate concerns:
┌─────────────────────────────────────────────────────┐
│ Browser: React 19 + Vite + TypeScript + xterm.js │
├─────────────────────────────────────────────────────┤
│ Sidebar │ Terminal (xterm) │ Chat Input │
│ Profiles│ Transcript │ Natural Language │
├─────────────────────────────────────────────────────┤
│ Core Orchestrator │
│ • Profile lifecycle management │
│ • WebContainer boot/shutdown │
│ • Credential vault (encrypted) │
├─────────────────────────────────────────────────────┤
│ Embedded Agent Runtime (Node.js in WebContainers) │
│ ┌──────────────────────────────────────────────┐ │
│ │ LLM Loop (OpenRouter / Ollama / Custom) │ │
│ │ Tool Registry (~40 built-in tools) │ │
│ │ Skill Manager (SKILL.md loader) │ │
│ │ Memory Layers (fact_store, session, reflect)│ │
│ │ Cron Scheduler (heartbeat + jobs) │ │
│ │ Channel Gateway (Telegram, Email) │ │
│ └──────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────┤
│ Persistence: IndexedDB + OPFS (browser-local) │
│ No server-side user state │
└─────────────────────────────────────────────────────┘That last line is worth emphasizing: your data never leaves your browser unless you explicitly configure an external LLM provider. The hosted demo at webagent.aratech.ae only serves the static application; every file, memory, and credential stays in your own browser's IndexedDB or OPFS storage. Even if the demo goes offline, your local data remains accessible through export/import.
This isn't a cloud product with a free tier — it's a tool that runs on your computer, delivered through the browser.
Technology Stack
Core Features
Let's walk through what actually makes this agent useful on a daily basis.
Isolated Profiles — Multiple Agents, One Browser
Think of a profile as a dedicated workspace for your agent. Each profile has its own:
- WebContainer filesystem (virtualized Node.js sandbox)
- Memory layers (facts, sessions, reflections, learnings)
- Credential vault
- Skill overrides
- Export/import snapshot
You can spin up up to 4 concurrent agents in different profiles simultaneously. One profile for work, one for personal, one for a client project, one for experimentation — they never cross-contaminate.
Knowledge Vault (PARA + Wiki)
Inspired by Karpathy's viral "AI Second Brain" concept, Web Agent has a first-class knowledge vault built in.
You can:
/wiki-setupto initialize a PARA-structured markdown vault/wiki-syncto ingest all your memory, accumulated facts, and skill learnings into the vault/wiki-searchto query your vault when the agent needs to surface context
The vault grows over time as you use the agent. Your sessions, facts, and learnings get synthesized into structured knowledge — not just a flat transcript log. This is the compounding knowledge loop in action.
Multi-Layer Memory
Web Agent stores four distinct types of memory, each with a different purpose:
Using the /memory-layers skill, you can consciously choose what to store where and avoid context duplication. This is the same memory architecture that powers Hermes Agent's ability to "remember everything that matters and forget everything that doesn't."
Self-Learning Loop
This is the piece that turns a chatbot into an agent that actually gets better over time.
Every time the agent completes a task, it can generate:
- Reflections — what worked, what didn't, what was missing
- Learnings — procedural patterns that generalize across tasks
- Facts — durable nuggets about your domain, preferences, and environment
These flow back into its memory and optionally into the knowledge vault. Over time, the agent doesn't just accumulate data — it assembles expertise.
Use skill_save to turn a successful, well-structured workflow into a reusable SKILL.md that the agent pulls in for related tasks in the future. Your agent's expertise grows alongside your project.
Knowledge Vault (PARA + wiki) - Expanded
Let's be concrete about how the knowledge vault works in practice:
- Initialize with
/wiki-setup— creates a PARA-structure in your workspace underknowledge-vault/ - Feed it — drop any business data (transcripts, PDFs, goals, competitor notes, notes, voice transcripts) into the workspace
- Sync with
/wiki-sync— the agent compiles all that raw material into a structured, AI-native knowledge base with an index, log, and cross-linked concepts - Query with
/wiki-search— the agent searches your vault before the general LLM knowledge base, producing outputs that are uniquely yours
This is how you turn generic AI slop into something that actually understands your business, your voice, your goals, and your past decisions. One query on your YouTube strategy vault produces video ideas that sound like you built them, not something that could have been generated for any channel.
~40 Built-in Tools
Web Agent ships with a comprehensive toolset out of the box:
Filesystem: read_file, write_file, edit_file, multi_edit, delete_file, move_file, make_dir, tree, find_files, grep, file_diff, file_stat
Memory: memory_save, memory_recall, memory_search, session_memory_append, session_memory_list, session_search
Skills: skill_list, skill_view, skill_save, skill_manage, skill_bulk_save, skill_delete, skill_recall
Automation: cron_register, cron_list, todo_write
Web & Vision: web_search, web_fetch, vision_analyze, youtube_transcribe, email
System: run_shell, system_info, artifact_present, apply_patch
All of these are available inside the WebContainer sandbox. They operate on your profile's isolated workspace, so you can experiment, break things, and recover without fear of losing the rest of your system.
Slash Commands & Planning Mode
Web Agent uses a slash command system borrowed from the best terminal UX patterns (Hermes Agent, Claude Code, OpenCode):
/help — show all available tools and commands
/clear — restart with a fresh conversation (keeps profile data)
/compact — compress older context, keep current conversation going
/checkpoint — save a named snapshot of the current session
/rollback — load a checkpoint
/skills — list/search installed skills
/plan [goal] — enter specification-first planning mode
/stop — interrupt current tool run
/exit — terminate the terminal sessionPlanning mode (/plan) is especially powerful. When you want to tackle a complex task:
- Type
/plan build a landing page for our new product - Web Agent reads your workspace (read-only, no modifications yet)
- It writes a full specification markdown file to
.webagent/plans/and presents it for your approval - You review, revise, or accept — then say "execute the plan" on your next message
- It executes the plan step by step, with full transparency
This is how you get rigorous execution and human oversight — the plan is reviewed before any code is written.
Multi-Platform Gateway
Web Agent isn't confined to the browser window. It includes a channel gateway architecture that can connect the agent to:
- Telegram — polling channel, long-running sessions in chat
- Email — via Resend provider, send and receive email from the agent
- Extensible — add new channels by dropping a capability module under
src/capabilities/channels/and rebuilding
On our Directus blog management workflow, we've wired Web Agent to manage scheduled posts, pull analytics, and respond to editorial queries — all through a Telegram chat interface. The agent runs in the browser (hosted demo), but the conversations happen in Telegram. That's the versatility you get from a proper channel abstraction layer.
Security & Privacy
There's a difference between "we claim we don't use your data" and "your data physically cannot leave your browser."
Web Agent does the latter. The local architecture guarantees:
- Encrypted per-profile API keys — stored in browser storage, never transmitted in the clear
- Workspace isolation — one profile's files and memory can't access another's
- No server-side user state — the hosted demo is transit-only; closing your browser discards your session from the server
- CORS proxy stateless — the fetch sidecar does not log or store traffic
- Secret redaction — API keys and credentials are redacted before any log output
- Tool guardrails — confirmation prompts for destructive operations, loop timeout protection
You can run Web Agent entirely offline for all local work; only LLM calls and web fetch operations require network access — and you control both credentials.
Getting Started in 60 Seconds
Here's the whole setup:
# 1. Open the demo
## → https://webagent.aratech.ae
## 2. Create a new profile (click "New Profile")
## 3. Set your LLM provider and API key (encrypted locally)
## 4. Start chatting — zero configuration requiredThat's it. No environment variables, no terminal, no build step. The agent boots its WebContainer runtime in ~5 seconds and you're in.
If you want to customize or contribute:
git clone https://github.com/nikola66/web-agent.git
cd web-agent && npm install
npm run dev # local development with hot-reload
npm run build # production static buildDeploy anywhere static files are served — Vercel, Netlify, Cloudflare Pages, a Caddy server, or a simple npx serve dist. No database, no server-side API required.
Self-Learning: The Agent That Gets Smarter Every Conversation
Let me highlight the self-learning loop one more time because it's the feature that will change how you think about AI agents.
Every interaction produces three things the system can store:
- Facts — "The user prefers TypeScript over JavaScript", "Our Directus blog uses English, Arabic, Spanish, German, and French"
- Reflections — "The video script task went well this time because the outline was approved before drafting", "I should check for typos when writing code examples"
- Learnings — "When working with the Directus API, always fetch the post ID before attempting to assign tags"
These are not transcript logs. They are structured, retrievable, intent-bearing pieces of knowledge that the agent can recall, apply, and reflect on. Over time, the agent doesn't just "remember" your recent conversation — it understands your project's trajectory and can fill in context gaps without explicit prompting.
Use skill_save to promote a particularly good workflow (like "cross-post to 5 languages with consistent formatting") into a reusable skill. Next time you say "cross-post my article," the agent pulls in that skill, checks your Directus translations, formats everything consistently, and returns the job done — without re-learning the process from scratch.
Comparison: Web Agent vs The Alternatives
How does this compare to what's out there today?
The honest difference: Web Agent is unusual. Most AI agent tools are built either as an IDE extension (Claude Code) or as a bespoke cloud service (Bolt, V0, Cursor). Web Agent rethinks where the agent lives: in the browser, in your control, with zero computing prerequisites. That matters.
Real-World Usage: How We Use It
Here's a representative sample of how we use Web Agent internally:
Daily Blog Management We route our editorial workflow through a Telegram channel connected to Web Agent. The agent reads our Directus blog, identifies drafts ready for review, formats them for publication, schedules cross-posts in 5 languages, and flags anything that needs human attention.
Research & Knowledge Compilation
We drop raw materials (videos, PDFs, competitor notes) into the agent's workspace, then run /wiki-sync to have the agent synthesize them into a structured knowledge vault — the same Karpathy second brain pattern we've discussed publicly. The difference: it happens automatically in the browser, not through manual prompt engineering in Claude Code.
Scheduled Automation Cron jobs run the agent in the background against its embedded Node.js runtime. One does nightly: "scan this folder for new designs, generate alt text using vision, and append to a changelog." All within the browser tab, no external server required.
Experimentation Sandbox Each profile is a disposable workspace. Trying a new git repo, running an experiment with a new API, building a quick prototype — spin up a fresh profile, do the work, export or discard. Nothing persists unless you want it to.
Open Source & Community
Web Agent is MIT-licensed. We built it to be as hackable as possible:
- Drop-in capability extensions: Put a folder under
src/capabilities/{tools,providers,channels,skills}/and rebuild — the system auto-discovers and loads it - Full access to agent internals: the embedded runtime is plain TypeScript compiled to ESM; browse it, modify it, rebuild it
- No gated features: everything in the repo is available in the live demo, no credit card, no invite
We'd love your contributions. If you've built an interesting skill, a new tool provider, or a creative workflow, please open a PR or open an issue to tell us about it.
Repository: https://github.com/nikola66/web-agent Live Demo: https://webagent.aratech.ae Support (if you want to buy a coffee): http://ko-fi.com/nikola66
Roadmap: What's Next
We're actively developing on the main branch. The v0.0.6 release (May 16, 2026) added the PARA knowledge vault builtins (/wiki-setup, /wiki-sync, /wiki-search), safer memory projection, and a set of Open Web Research capabilities for deep discovery tasks.
Short-term roadmap (next few weeks):
- More built-in skill templates (Directus management, blog cross-posting, podcast production)
- Expanded provider list (OpenAI, DeepSeek, and others as OpenAI-compatible)
- Larger concurrent profile support
- Test suite for tool smoke tests (in progress)
- Public skill registry — share and discover community skills
Medium-term:
- Plugin system for workspace-level extensions
- Media-heavy workflows (audio transcription, video analysis, image generation pipelines)
- Deeper insight dashboard: "what has this agent learned about my project?"
- Team/shared profile modes for small teams
Conclusion
The promise of AI agents has always been: autonomous workflows that know your context, learn from your feedback, and get smarter over time. The problem has been friction — installation, maintenance, isolation, trust.
Web Agent eliminates friction. It runs in the browser, never sends your data to our servers, keeps your profiles isolated, builds a growing knowledge base about your work, and gives you the full power of autonomous AI — no Docker, no Python, no server.
It's not a toy. It's the same system we built for ourselves, now open-sourced for anyone who wants to use it, study it, customize it, or apply it to something we haven't thought of yet.
We'd love to hear how you use it.
Try it now: https://webagent.aratech.ae See the code: https://github.com/nikola66/web-agent Star the repo: ⭐ https://github.com/nikola66/web-agent
What's next for you? Join our community, build a skill, share your setup. We're building something different — with you, not just for you.