MBS Workbench — Local AI. Full IDE. One Machine.

The Problem

The AI toolchain is
broken by design.

Inference client. Code editor. Image tool. Deploy pipeline. Training platform. Five apps, five subscriptions, five surfaces for your IP to leak. Nobody built them to work together — because nobody had to, until now.

01 / 03

Your workflow is 4 apps.
It should be 1.

Context switch between inference, IDE, image generation, and deployment. Every handoff is friction, lost state, and engineer time. The tools were never designed to share context with each other.

Workbench unifies all six categories in one offline-first window. No integrations. No context lost between steps.

02 / 03

With most AI tools, your code
isn’t actually private.

AI editors route completions through remote APIs. Image tools process your work on third-party servers. Your source code, prompts, and IP flowing through infrastructure you didn’t choose.

Every inference call runs on your hardware. No remote processing by default. No telemetry on work product.

03 / 03

Local dev and production
are different worlds.

You build locally, then hand off to entirely separate tooling for containers and cloud. The AI work you did has no direct path to what ships. You rebuild the pipeline every project.

Docker, Kubernetes, and 10 cloud providers in the same window where the agent wrote the code.

Workbench doesn’t fill a gap. It closes all three. One app, running offline, combining six tool categories the industry kept separate for years — because combining them required building all six from scratch.

What’s Inside

Six pillars.
One app.
No compromises.

Each of these would ship as a standalone product. In Workbench, they share state, context, and models — making each one exponentially more useful than it would be alone.

inference-logs — CUDA — 10.4 t/s — Qwen2.5-32B

Local AI Engine — CUDA-accelerated. Zero cloud.

GGUF inference with Flash Attention v3 and speculative decoding. Any model, any size. Auto-configured on first launch — no CUDA setup, no quantization guesswork. Mix local and cloud in the same session.

10–12 t/s on 30B models, 4 GB VRAM minimum

Flash Attention v3 + spec decoding + 256K ctx (8-bit KV)

Auto-quantization Q2_K → Q8_0 based on available VRAM

Multi-modal vision + real embeddings + cosine similarity

10 cloud providers (Anthropic, OpenAI, Groq, Mistral…)

mbsd daemon (JSON-RPC 2.0, TCP:3031) + TS & Python SDKs

GGUF / CUDA / FLASH ATTN v3 / 256K CTX

dev-panel — Monaco LSP — Python (pylance) — DAP active

Professional Code Editor — real IDE, not a text field.

Monaco editor with full LSP, real PTY terminal, DAP debugger, and native Git. Every AI completion uses the local model in Pillar 1 — no additional API call, no latency, no context switch.

LSP: Python, TypeScript, Rust, Go, Lua, C/C++ — auto-installs servers

Real ConPTY/openpty terminal with OSC 633 shell integration

DAP debugger: 9 adapters (Node, Python, Rust, Go, C++…)

Native Git: blame, partial stage, interactive rebase, reflog, graph

150+ settings, tab groups, keybinding editor, extension API

Node.js extension runtime + plugin marketplace + snippet library

MONACO / LSP / DAP / NATIVE GIT

mcp-workflow — agent — step 4/11 — GSD-2 active

Autonomous Agents — goal to production, unattended.

Describe a goal. The agent decomposes it, edits files across your codebase, runs terminal commands, calls APIs, tests results, iterates — live audit trail every step. Safety gates prevent anything destructive without your approval.

23 built-in tools: code edits, files, terminal, Git, web search

24 MCP servers: PostgreSQL, Docker, browsers, blockchains, mobile

GSD-2: milestone/slice/task hierarchy, crash recovery, cost ledger

Three-tier safety gates — full action audit log per step

BSON episodic memory + MemoryPalace cross-session context

Stuck detection, complexity scoring, automatic spec replay

REACT / GSD-2 / 24 MCP SERVERS / BSON MEMORY

image-gen — SD 1.5 — 512×512 — step 18/30

Creative Studio — images and voice, fully offline.

Complete Stable Diffusion pipeline and a professional Voice Studio — on your GPU, with no per-image cost, no API keys, no prompts shared anywhere. Generate until you’re satisfied.

Text-to-image, img2img, inpainting, ControlNet, LoRA stacking

Batch generation with real-time preview + hardware auto-config

Voice Studio TTS: 22 voices (Kokoro ONNX + SAPI fallback)

4 STT modes incl. whisper.cpp — voice-to-code + dictation

P2P network for session sharing across devices

Theme Editor — full UI customization, 10+ built-in themes

STABLE DIFFUSION / 22 TTS VOICES / 4 STT MODES

training-dashboard — LoRA — step 240/500 — loss 0.423

Model Training — fine-tune and use, without leaving the IDE.

Fine-tune any model on your own data. Watch it train. Export and load it directly into the inference engine. Cloud GPU rental is one click away when local hardware is not enough.

LoRA & QLoRA fine-tuning from 6 GB VRAM

GRPO training via real trl/transformers pipeline

Live loss curves, gradient norm & step timing dashboard

Cloud GPU Hub: Vast.ai, RunPod, Lambda Labs (real APIs)

7 export formats: GGUF, ONNX, CoreML, TensorRT, AWQ, GPTQ, TFLite

Hardware-aware preset wizard — auto-tunes batch size & LR

LORA / QLORA / GRPO / CLOUD GPU HUB

deployment-center — Vercel — prod — deploying…

Deploy Anywhere — from the same window where you built it.

Every project built in Workbench ships to production without switching tools. Docker, Kubernetes, and 10 cloud providers in one panel. Cost estimates shown before you commit.

Docker: build, tag, push, run, logs, Compose up/down lifecycle

Kubernetes: pod/svc/deploy explorer, port-forward, namespaces

Azure, GCP, AWS, Vercel, Netlify + more — real APIs, cost estimates

Remote SSH + Dev Containers (docker build/run, SCP, Remote LSP)

mbsd + mbs CLI + TS & Python SDKs for pipeline automation

Encrypted secrets vault + environment profile management

DOCKER / K8S / 10 CLOUD PROVIDERS / SSH

Use Cases

Build anything.
Run everything.

Because all six pillars share state, the outcomes aren’t just the sum of the parts. The agent knows your codebase. Your fine-tuned model answers the agent’s questions. Deployment follows directly from the build.

full-stack development

Describe an app. Ship it to production.

Give the agent a goal. It scaffolds the project, writes the code, runs tests, fixes errors, and deploys to the cloud — while you watch every step in the audit log. No context switches. No different tools.

“Build a task manager with React, FastAPI, SQLite. Add auth, write tests, deploy to Vercel.”

AI EngineCode EditorAgentDeploy

ai-augmented development

A coding assistant that knows your entire codebase.

Local inference means completions in milliseconds. The model sees your whole codebase through semantic injection — not just the open file. Episodic memory carries project history across every session.

“Review this PR for security issues, suggest fixes, and add missing tests. Match the patterns in the rest of the codebase.”

AI EngineCode EditorAgent

custom model training

Fine-tune on your data. Use it immediately.

Upload a dataset, pick a base model, run LoRA fine-tuning on your GPU. Watch loss curves live. When training finishes, load it directly into the inference engine — no export pipeline, no cloud bill.

“Fine-tune Llama 3 on my company’s API docs so it answers internal developer questions accurately.”

AI EngineTrainingAgent

background automation

Schedule agents to run while you sleep.

GSD-2 runs multi-phase agentic tasks autonomously. Checkpoint and resume on crash. Safety gates prevent destructive actions without your approval. Define the workflow once — Workbench runs it on schedule.

“Every night, check my repos for failing tests, open a PR with fixes, and Slack me a summary.”

AgentAI EngineCode Editor

creative production

Generate images and voice, completely offline.

Stable Diffusion on your GPU with no per-image cost. Combine ControlNet conditioning with LoRA style adapters. Voice Studio’s 22-voice TTS for narrated content — zero data leaves your machine.

“Generate a hero banner and feature illustrations, then produce voiceover audio for my product launch.”

Creative StudioAgentDeploy

multi-model research

Run multiple models against the same problem.

Multi-model chat mixes local and cloud providers in one session. Run adversarial debates, chain outputs between models, or benchmark your fine-tuned model against a frontier model — without swapping tools.

“Have local Qwen-32B and GPT-4o debate the best architecture for a real-time multiplayer backend, then scaffold the winner.”

AI EngineCloud APIsAgent

Technical Depth

70 phases.
Real. Shipped.

Not a concept. Not a prototype. Built phase by phase — each shipped, audited, and functional. Here are the numbers that matter.

0+ Phases shipped & functional

0 Backend Tauri commands

0+ Rust modules

0+ Community members in beta

AI Runtime

GGUF + Flash Attn v3 + speculative decoding + auto-quant
Real embeddings, cosine similarity, semantic codebase search
Multi-modal vision (LLaVA, Qwen-VL) — routes through loaded model
10 cloud providers, SSE streaming, BYOK cost tracking per call
mbsd daemon (JSON-RPC 2.0, TCP:3031) + TS & Python SDKs

Agent Intelligence

ReAct loop: 23 tools + 24 MCP server integrations
GSD-2: spec-driven workflow engine, milestone hierarchy
Crash recovery, cost ledger, stuck detection & auto-spec replay
BSON episodic memory + MemoryPalace cross-session context
Three-tier safety + complete action audit log every step

IDE Core

Monaco + LSP (7 langs), call hierarchy, semantic tokens
Real PTY + OSC 633 shell integration, buffer search
DAP: 9 adapters, inline values, conditional breakpoints, source maps
Native Git: blame, partial stage, interactive rebase, reflog, graph
Node.js extension runtime, marketplace, snippet library

Build & Deploy

Docker lifecycle, Compose, K8s pod/svc/deploy explorer
Azure, GCP, AWS, Vercel, Netlify — real APIs, cost estimates
LoRA/QLoRA/GRPO + Cloud GPU Hub (Vast.ai, RunPod, Lambda)
7 model export formats incl. TensorRT & CoreML
Remote SSH + Dev Containers with docker build/SCP/Remote LSP

[ ✓ ] STATUS

70+ phases shipped

Every feature on this page is implemented and functional. No vaporware, no stubbed commands.

[ ✓ ] PRIVACY

0 cloud deps required

Every inference, image gen, and agent action runs on your hardware by default. Nothing leaves unless you choose.

[ ✓ ] QUALITY

0 critical stubs remain

cargo clippy: 0 warnings, 0 errors. All former stubs replaced with real implementations.

[ ✓ ] COMMUNITY

3,600+ beta developers

Active community since early alpha. Every release shaped by real-world developer feedback.

Get Started

Start
building
today.

Private beta. All six pillars included, all features unlocked. Request access and we’ll send download instructions as soon as your request is reviewed.

MBS Workbench v0.2.4

workbench-setup-0.2.4.exe · ~583 MB · Windows 10/11

Signed installer · VirusTotal verified · GPU auto-detected on launch

Request Access →

system_requirements

OSWindows 10/11 · Linux (Ubuntu 20.04+)

RAM8 GB min · 16 GB recommended

GPUNVIDIA 4 GB+ VRAM · CPU fallback available

Storage10 GB free (models stored separately)

CUDA12.x recommended · 11.x supported

Telegram3,600+ members Discord1,000+ members DocumentationFull API + guides

./join_waitlist

Invite-only. Fill in your details — we’ll send access instructions as soon as your request is reviewed.

// no spam · no data sold · access instructions only

Free during private beta. All six pillars with no feature gates. Premium tier ($10/mo) planned for cloud API credit bundles and team features — local inference, agents, training, and image generation will remain free forever.

One app. Every AI tool you need.

The AI toolchain isbroken by design.

Your workflow is 4 apps.It should be 1.

With most AI tools, your codeisn’t actually private.

Local dev and productionare different worlds.

Six pillars.One app.No compromises.