Documentation

Build with AI.
Locally. Privately. Yours.

MBS Workbench is a GPU-accelerated AI development environment that runs entirely on your machine. Local inference, image generation, model training, 10 cloud API providers, and a full code editor — no subscriptions, no cloud, no data leaves your device.

Get Started Download for Windows

158+

Backend Modules

191

UI Components

MCP Servers

250%+

AI Capability

100%

Local & Private

Why MBS Workbench?

AI-assisted coding is a $7.5B market growing 35% year-over-year. Every leading tool requires cloud connectivity and recurring payments. MBS Workbench is the first product to combine a full code editor, local LLM inference, image generation, on-device model training, a 23-tool autonomous agent, 10 cloud API providers, an 800K+ model marketplace, a full Design & Build suite with visual canvas, and mobile export — in a single native binary. 70 major feature releases. 158+ backend modules. 191 frontend components. 250%+ AI capability beyond any cloud tool. 224,972 lines of code. 1,926 Tauri commands. Your GPU. Your data. Zero marginal cost.

Download & Install #

System Requirements

Component	Minimum	Recommended
OS	Windows 10 (64-bit)	Windows 11
RAM	8 GB	16 GB+
GPU	CPU-only mode available	NVIDIA GPU with 4 GB+ VRAM
Storage	500 MB (app)	10 GB+ (app + models)
CUDA	—	CUDA 12.x for GPU acceleration

Installation Methods

NSIS Installer

One-click Windows installer with start menu shortcuts and uninstaller. Recommended for most users.

MBS Workbench_x64-setup.exe

MSI Installer

Enterprise-grade MSI package for Group Policy and SCCM deployment across organizations.

MBS Workbench_x64_en-US.msi

Portable EXE

Standalone executable — no installation required. Run from USB drives or restricted environments.

MBS Workbench.exe (134 MB)

GPU Acceleration

MBS Workbench includes built-in CUDA support. If you have an NVIDIA GPU, GPU acceleration activates automatically — no driver installation or configuration needed beyond standard NVIDIA drivers.

Quick Start #

Get productive in under 5 minutes. Here's everything you need to go from install to your first AI-assisted coding session.

Download & Launch

Download the installer from the releases page and run it. MBS Workbench launches instantly — no account creation, no sign-in, no telemetry.

Download an AI Model

Open the AI & Models panel from the Activity Bar (or press Ctrl+Shift+A). Click HuggingFace to browse 800,000+ models. We recommend starting with Qwen 2.5 Coder 7B Q4_K_M — it fits in 4 GB VRAM and delivers excellent code generation.

Load the Model

Go to Load Model and select your downloaded GGUF file. MBS automatically detects your GPU and optimizes layer offloading, context window, and batch size.

Open a Project

Press Ctrl+Shift+O or click File → Open Folder to open your workspace. The file explorer appears in the sidebar — click any file to start editing.

Start Coding with AI

The AI chat panel appears on the right. Type a prompt like "Create a REST API with Express.js and TypeScript" — the agent will create files, install dependencies, and set everything up. You can also use @file, @folder, or @codebase to inject context.

Code Editor #

MBS Workbench includes a full Monaco-powered code editor — the same engine that powers VS Code. You get an enterprise-grade editing experience without any external dependencies.

50+ Languages

Syntax highlighting for TypeScript, Python, Rust, Go, Java, C++, Solidity, and dozens more.

Multi-Tab Editor

Open multiple files in tabs. Drag to reorder. Modified files show a dot indicator.

Minimap & Breadcrumbs

Minimap overview of your file. Breadcrumb navigation shows your location in the code hierarchy.

Find & Replace

Powerful in-file search with regex support. Global workspace search with file filtering.

Split Editor

Split your editor into multiple panes. Compare files side by side or reference code while editing.

Code Folding

Collapse code blocks, bracket pair colorization, indentation guides, and sticky scroll headers.

AI Chat & Autonomous Agent #

MBS Workbench includes a 23-tool autonomous agent powered by a structured ReAct (Reason → Act → Observe) state machine — not brittle prompt chains. Give it a task in natural language and it will plan, execute, and iterate until it's done.

Agent Capabilities

Category	Tools	What It Can Do
Filesystem	7	Create, read, edit, delete, search files, batch-read directories
Terminal	3	Run shell commands, execute Python & Node.js scripts
Web	3	Web search, fetch URLs, scrape webpages
Git	5	Status, commit, push, pull, branch management
Analysis	3	Static analysis, code explanation, refactoring suggestions
Database	2	SQLite queries, CSV parsing

Safety System

A three-tier permission system protects you from unintended side effects:

Safe (Tier 1)

Read-only operations and file creation. read_file, create_file, list_directory — always allowed.

Elevated (Tier 2)

Execution and analysis tools. execute_command, git_commit, analyze_code — allowed after model qualification.

Dangerous (Tier 3)

Network and destructive operations. web_search, git_push, delete_file — restricted to verified models.

@-Context Injection

Inject precise context into any AI conversation by typing @ followed by a context source:

Context	Description
`@file`	Inject the contents of a specific file into the prompt
`@folder`	Inject directory tree and file summaries
`@codebase`	Inject indexed codebase context (symbols, definitions)
`@selection`	Inject current editor selection
`@terminal`	Inject last terminal output

Multi-File AI Edits

The AI agent can propose changes across multiple files simultaneously. You get a unified diff viewer with:

Side-by-side diff comparison
Accept / Reject controls per file
Accept All / Reject All bulk actions
File tree showing all changed files

Model Management #

Loading Models

MBS Workbench supports any GGUF-format model — the industry standard for local LLM inference. Models are loaded directly into process memory with native CUDA GPU offloading.

Hardware-Aware Auto-Configuration

When you load a model, MBS automatically detects your GPU VRAM, system RAM, and CPU cores. It computes optimal GPU layer offloading, context window size, and batch dimensions — no manual tuning required.

Recommended Models

Model	Size	Best For	Min. VRAM
Qwen 2.5 Coder 7B Q4_K_M	4.4 GB	Code generation, best quality/size ratio	4 GB
DeepSeek-R1 7B Q4_K_M	4.5 GB	Reasoning & chain-of-thought	4 GB
Llama 3.3 8B Q4_K_M	5.0 GB	General purpose, chat	6 GB
Phi-3 Mini Q4_K_M	2.0 GB	Small, fast, low-resource machines	2 GB
Mistral 7B Q4_K_M	4.1 GB	Versatile, strong instruction following	4 GB
CodeLlama 13B Q3_K_M	5.5 GB	Complex code tasks (8 GB VRAM)	8 GB

GPU Acceleration (CUDA)

MBS Workbench embeds llama.cpp with native CUDA bindings directly in the Tauri binary. Unlike Ollama or LM Studio, there's no separate inference server — the model runs in the same process as the editor.

Partial GPU Offloading

If your GPU has limited VRAM, MBS automatically splits model layers between GPU and CPU. A 7B model on a 4 GB GPU offloads ~28 of 32 layers to GPU, keeping 4 on CPU.

Full GPU Offloading

With sufficient VRAM, all layers run on GPU for maximum speed. A 7B Q4 model fully offloaded to an RTX 4060 delivers 40+ tokens/sec.

Model Parameters

Fine-tune inference behavior from the Parameters panel:

Parameter	Range	Description
Temperature	0.0 – 2.0	Controls randomness. Lower = more deterministic, higher = more creative.
Top-P	0.0 – 1.0	Nucleus sampling — limits token pool to cumulative probability threshold.
Top-K	1 – 200	Limits sampling to the top K most likely tokens.
Repeat Penalty	1.0 – 2.0	Penalizes repeated tokens to reduce repetitive output.
Max Tokens	64 – 32768	Maximum number of tokens to generate per response.
Context Window	512 – 131072	Total tokens the model can see (prompt + response). Auto-sized to 75% GPU capacity.

Cloud Providers (10 APIs)

While MBS Workbench is built for local inference, you can connect to 10 cloud LLM providers for hybrid workflows. Mix local and cloud models in the same session with real-time cost tracking.

Local Models (GGUF) OpenAI (GPT-4o) Anthropic (Claude 3.5) Google (Gemini Pro) AWS Bedrock DeepSeek Groq Mistral Cohere Together AI OpenRouter

Configure API keys in Settings → Cloud Providers. The unified provider selector in chat lets you switch between local and cloud models per-message. An OpenAI-compatible fallback supports any additional provider.

Speculative Decoding

Accelerate inference by pairing a large model with a smaller draft model. The draft model generates candidate tokens that the main model verifies in parallel — delivering 2-3x speedup on compatible hardware.

Embeddings & Vision

Load embedding models for local RAG (Retrieval-Augmented Generation) and semantic search. Vision model support (LLaVA, Qwen-VL) lets you paste images directly in chat for screenshot-to-code workflows and image analysis.

HuggingFace Explorer #

Browse and download from HuggingFace's 800,000+ model library without leaving the app. MBS Workbench scores every model against your hardware for instant compatibility assessment.

12 Task Categories

Chat, Code, Image Gen, WebDev, Game Dev, Agent, Reasoning, Vision, Embedding, Translation, Summarization, Math

Smart Scoring

Models ranked by downloads × likes × quantization quality × hardware fit. Badges: Top 5, Popular, Trusted.

One-Click Download

Streaming download with real-time bytes/sec, ETA, and progress. Auto-detect GGUF variants and quantization levels.

Hardware Matching

Each model scored: Perfect, Good, Possible, or Too Large for your system.

Inline AI Completions #

Get real-time AI-powered code suggestions as you type — similar to GitHub Copilot, but running entirely on your local GPU with zero latency and complete privacy.

Ghost text appears as you type (400ms debounce)
Press Tab to accept, Esc to dismiss
Context-aware: reads surrounding code for accurate suggestions
Toggle on/off from the editor toolbar
Works with any loaded model — no separate "completion model" needed

Privacy Guarantee

Unlike GitHub Copilot, every keystroke stays on your machine. Your code is never transmitted to any external server. This makes MBS Workbench the only viable option for defense, healthcare, finance, and government teams with strict data compliance requirements (HIPAA, ITAR, SOX, GDPR).

Integrated Terminal #

A full terminal emulator built into the bottom panel — no need to switch between windows.

xterm.js-powered terminal with PTY backend
Auto-syncs working directory with your workspace
Toggle with Ctrl+`
Run commands, install dependencies, start servers
Agent can execute commands through the terminal

MCP Servers #

MBS Workbench ships with full Model Context Protocol (MCP) infrastructure and 24 pre-configured server definitions spanning 8 domains. MCP enables your AI model to interact with external tools, APIs, and services through a standardized JSON-RPC 2.0 protocol.

Domain	Servers	Capabilities
Dev Tools	Rust Analyzer, Pyright, TypeScript, Clangd, Go, TexLab	Language intelligence, diagnostics, completions
Blockchain	Solana, Ethereum, CoinMarketCap, DEX Screener	Smart contract interaction, price data, DEX analytics
Game Engines	Godot, Unity, Unreal	Scene management, asset creation, scripting
Web Dev	Puppeteer, Vercel, Docker	Browser automation, deployment, containerization
Databases	PostgreSQL, SQLite, Redis	Query execution, schema inspection, caching
Media	ComfyUI, FFmpeg, ImageMagick	Image generation, video processing, media conversion
Mobile	Flutter, Android	App building, emulator control, hot reload
Utilities	Filesystem, Terminal, Git, HTTP	File ops, command execution, version control, API calls

How MCP Works

Each server spawns on-demand via JSON-RPC 2.0 over stdio with health monitoring and auto-restart. The AI model can interact with Solana RPCs, query PostgreSQL databases, automate a browser, or render 3D scenes — all from natural language prompts.

Language Server Protocol (LSP) #

Built-in LSP client for professional-grade language intelligence:

TypeScript Python Rust Go JSON HTML CSS

Real-time diagnostics (errors & warnings) in the editor
Symbol extraction and navigation
Document synchronization (open / change / save)
Start/stop language servers from the UI
Automatic server restart on crash

Extensions #

MBS Workbench ships with a built-in extension system:

Git

Python

Markdown

AI Completions

Live Preview

Mermaid

RAG

Image Gen

Search & Replace #

Powerful workspace-wide search with regex, case sensitivity, include/exclude glob filters, and bulk replace. Results are grouped by file with match highlighting. Access via Ctrl+Shift+F or the Search panel in the Activity Bar.

Live Preview #

Preview HTML, CSS, and JavaScript projects in a split pane without leaving the editor. The embedded Warp HTTP server provides instant hot-reload as you type. Toggle with View → Toggle Preview.

Debug & Test #

Full DAP (Debug Adapter Protocol) integration with 9 debug adapters — Node.js, Python, Rust, Go, C/C++, Java, .NET, PHP, and Ruby. Set breakpoints, step through code, inspect variables, evaluate expressions, and view call stacks — all inside MBS Workbench.

Breakpoints & Stepping

Line breakpoints, conditional breakpoints, step over/into/out, continue, restart, and stop. Inline variable values shown directly in the editor.

Compound Configs

Launch multiple debug targets simultaneously. Source map VLQ decode for debugging transpiled code. Launch.json editor with template generation.

Integrated test runner with support for npm test, cargo test, pytest, and jest. Task auto-detection discovers package.json scripts, Cargo targets, and Makefiles automatically. Test results display inline with pass/fail status.

Source Control (Git) #

Complete Git integration powered by git2-rs (native Rust bindings) and porcelain v2 commands. Full SCM sidebar with visual staging, inline blame, and merge conflict resolution.

SCM Sidebar

Stage/unstage files and individual hunks. Commit with message. Visual diff editor. Branch management (create, switch, delete, rename). Tag and stash support.

Advanced Git

Inline blame annotations. Git gutter decorations. Cherry-pick, rebase, and reflog. Merge conflict resolution with accept current/incoming/both. Partial (line-level) staging.

Push, pull, and fetch operations run asynchronously with progress notifications. SCM toolbar provides one-click access to commit, stage all, unstage all, pull, push, and branch switching.

AI Refactoring #

AI-powered code refactoring with real safety checks — the refactoring engine actually runs your test suite (npm test, cargo test, pytest) before and after each transformation to verify correctness.

Extract & Move

Extract method, extract component (React/Vue), move symbol to another file. AI suggests optimal extraction boundaries based on code analysis.

AI Rename & Patterns

Context-aware renaming across files. Regex-based pattern transforms. Full refactoring undo with one-click revert. Safety score shown before applying.

Multi-Model Chat #

Chat with multiple AI models simultaneously in a side-by-side view. Compare responses, run model tournaments, and track cost per query across providers.

Simultaneous chat — send one prompt, get responses from 2-4 models at once
Tournament mode — blind comparison where you pick the best response
Cost analysis — real-time token cost tracking per provider (OpenAI, Anthropic, Google, local)
Templates — save and reuse multi-model comparison setups
Local + Cloud — mix local GGUF models with cloud APIs in the same session

Voice Input (STT) #

Live microphone speech-to-text powered by the browser's built-in Web Speech Recognition API — no external binary, no cloud. Continuous recording mode streams interim and final transcripts in real time. Use it to dictate code, chat messages, or search queries entirely on-device.

One-click Start / Stop recording with visual indicator
Language selection (auto-detect or manual BCP-47 code)
Interim transcript streamed as you speak
Full error messaging — microphone denied, no speech, audio capture failures
No external binary required — works via WebView2 on all supported Windows versions

Image Generation #

Generate images from text prompts using Stable Diffusion running entirely on your GPU. Supports SD 1.5, SDXL, and FLUX architectures with automatic model detection and configuration. No cloud fees, no API limits — unlimited generation on your own hardware.

Text-to-Image

Enter a prompt, generate an image. Token weighting syntax for precise control. 20+ built-in prompt templates. Negative prompt presets. Multiple samplers (Euler, DPM++, LCM) with auto-selection.

img2img & Inpainting

Transform existing images with guided generation. Mask-based selective regeneration. ControlNet integration with 8 modes (Canny, Depth, Pose, Scribble, and more). Adjustable denoising strength.

Live Preview & History

Watch images evolve in real-time as denoising progresses. Full generation history with metadata. Re-generate any previous image with one click. Search by prompt, model, or date.

LoRA & Batch Generation

Stack multiple LoRA weights simultaneously with per-LoRA strength sliders. Smart batch mode with variation strategies (seed walk, CFG sweep, sampler comparison). Export batches as ZIP.

Dynamic VRAM Management

MBS Workbench automatically manages memory between text and image models. When you start generating images, it can prompt you to unload the text model to free VRAM. Auto-tiling and TAESD fallback ensure SDXL generation works even on 4 GB GPUs.

Model Training #

Fine-tune LoRA and QLoRA adapters on your own codebase or domain-specific data. Train specialized models that outperform generic models on your specific tasks — from 4 GB laptops to multi-GPU workstations.

Visual Training Dashboard

Real-time loss curves with moving averages. GPU/CPU memory gauges. Sample generation during training. Estimated time remaining. Pause/resume with one click. Auto-save checkpoints.

Hardware-Aware Presets

Smart hardware detection recommends optimal batch sizes, LoRA rank, and gradient accumulation. Four tiers: Quick (15 min), Balanced (1-2 hr), Quality (3-6 hr), and Professional (multi-GPU).

Advanced Training

Distributed training with auto-generated configs. CPU offloading for 30B+ models on consumer GPUs. Context extension (32K→256K+). Reinforcement learning support.

Export & Deploy

Export trained adapters as GGUF for immediate inference. Multiple quantization options. One-click upload to HuggingFace Hub. Agent trajectory dataset builder for creating tool-using coding agents.

6 use-case presets: Code Completion, Bug Fixer, Refactoring, Test Generator, Documentation, Full Agent
Pre-flight checks validate CUDA, VRAM, disk space, and dataset before training starts
Automatic OOM recovery — reduces batch size and enables gradient checkpointing on the fly
Cloud GPU rental integration (one-click provisioning with cost monitoring and auto-shutoff)

Training for Everyone

MBS Workbench is the only IDE that lets you fine-tune AI models and use them for code completion in the same application. Train a 512 MB specialized adapter, load it immediately, and see the results in your editor. No Python scripts. No command line. No cloud dependencies.

Ollama-Compatible API Server #

MBS Workbench includes a built-in HTTP server with OpenAI-compatible and Ollama-compatible API endpoints. Point any tool that supports the OpenAI API format at your local MBS instance and get responses from your locally loaded model.

/v1/chat/completions — OpenAI-compatible chat endpoint
/api/generate — Ollama-compatible generation endpoint
API key management for access control
Server start/stop from the UI or command palette

Remote Development #

Develop on remote machines via real SSH connections (OpenSSH), run code inside Dev Containers (real docker build and docker run), and sync files via SCP.

SSH Remote

Save and manage SSH configurations. Connect to remote hosts. Browse and edit remote files. Open remote terminals. Run models on remote GPUs.

Dev Containers

Build and run Docker dev containers directly from the UI. Port forwarding. File sync between local and container. Remote LSP support via SSH command spawn.

Docker & Kubernetes #

Full container management without leaving your IDE. MBS Workbench integrates directly with Docker Engine and Kubernetes clusters.

Docker Dashboard

Container Management

List, start, stop, restart, remove containers. View logs and open exec shells. Docker Compose up/down/status.

Image & Build

Pull, build, remove images. Volume and network management. System prune and disk usage monitoring.

Kubernetes Explorer

Visual cluster browser with context/namespace management, pod inspection (logs, describe, YAML), service & deployment management, port-forwarding, and resource YAML apply/delete.

Compose → K8s Converter

Convert Docker Compose files into Kubernetes manifests with one click. Generates Deployments, Services, Ingress, PVCs, HPAs, and health checks. Includes a Dockerfile generator for 6 frameworks (Node, Python, Rust, Go, Java, Static).

Local Cluster Provisioning

Create and manage local Kubernetes clusters through Minikube, Kind, or k3d — directly from the UI.

Cloud GPU Hub #

Need more GPU power than your local hardware provides? The Cloud GPU Hub lets you rent cloud GPUs from leading providers — directly from within MBS Workbench.

Browse & Rent

Compare GPU instances across cloud providers with real-time pricing, availability, and specs. Launch instances with one click and connect them to your workspace.

Budget Controls

Set monthly spending limits, configure cost alerts, and enable auto-terminate to prevent cost overruns. Track spending across all providers in a unified dashboard.

Instance Management

Monitor running instances, view resource utilization, and manage lifecycle (start, stop, terminate) from the Activity Bar. No separate provider dashboards needed.

Integrated Workflows

Use rented GPUs for model training, inference, and image generation — the same features you use locally, running on cloud hardware when you need extra power.

BYOK (Bring Your Own Keys)

Already have accounts with cloud GPU providers? Connect your existing accounts at no extra cost. MBS Workbench acts as a unified dashboard — we never mark up provider pricing for BYOK users.

Cloud Deployment #

Deploy to any major cloud provider with a unified interface:

Provider	Services
Azure	Container Apps, Azure Functions
Google Cloud	Cloud Run, Cloud Functions
AWS	Lambda (create + update)
Vercel	Frontend & serverless deploy
Netlify	Static site & serverless deploy

Each provider includes CLI detection, login management, deployment history, logs, and config template generation.

Model Export & Conversion #

Convert models to production formats for edge deployment:

ONNX CoreML TensorRT OpenVINO GGUF TFLite

Includes device-specific optimization, size estimation, cloud storage upload (S3/GCS/Azure), and inference code generation for Python, Swift, and Kotlin.

Cost Analytics #

Track and optimize your AI spending across all providers. Five-tab dashboard covering overview, cost breakdown by provider/model, pricing reference, budget alerts with thresholds, and local-vs-cloud ROI analysis. See exactly how much you save by running locally.

Keyboard Shortcuts #

MBS Workbench uses familiar VS Code shortcuts. Here are the most important ones:

General

Command PaletteCtrl+Shift+P

Toggle SidebarCtrl+B

Toggle TerminalCtrl+`

Zen ModeCtrl+K Z

New FileCtrl+N

Open FolderCtrl+Shift+O

Save FileCtrl+S

Close TabCtrl+W

Activity Bar Panels

ExplorerCtrl+Shift+E

SearchCtrl+Shift+F

AI & ModelsCtrl+Shift+A

ConnectionsCtrl+Shift+N

Debug & TestCtrl+Shift+D

DeployCtrl+Shift+Y

ExtensionsCtrl+Shift+X

SettingsCtrl+,

Editor

Find in FileCtrl+F

ReplaceCtrl+H

Go to LineCtrl+G

Format DocumentShift+Alt+F

Next TabCtrl+Tab

Previous TabCtrl+Shift+Tab

Themes & Appearance #

MBS Workbench ships with three theme modes and a built-in theme editor:

Dark (Default)

Deep indigo-blue palette with glassmorphism effects. Easy on the eyes for long sessions.

Light

Clean white background with soft shadows. High readability in bright environments.

High Contrast

Maximum contrast for accessibility. Meets WCAG AAA standards.

The Theme Editor lets you customize colors, radii, shadows, and spacing through a visual JSON-based editor. Over 100 CSS custom properties (design tokens) control every visual element.

Command Palette #

Press Ctrl+Shift+P to open the Command Palette — a fuzzy-search overlay that indexes all 52+ commands across 7 categories (AI & Models, Connections, Debug & Test, Deploy & Cloud, Extensions, Navigation, Editor). Type to filter, arrow keys to navigate, Enter to execute.

Settings #

All configuration is managed through the Settings panel. Inference parameters, theme preferences, keyboard shortcuts, cloud API keys, and extension state are all accessible from one place. Settings persist across sessions via local storage.

Guides #

Step-by-step tutorials to get the most out of MBS Workbench. Each guide walks through a real-world workflow from start to finish.

Your First AI Project #

Build a complete web app from scratch using the AI agent — no prior experience needed.

Create a New Workspace

Press Ctrl+Shift+O and select an empty folder (e.g., my-first-app). This becomes your project root.

Load a Model

Open the AI & Models panel (Ctrl+Shift+A). If you haven't downloaded a model yet, click HuggingFace, search for Qwen 2.5 Coder 7B Q4_K_M, and download it. Then switch to Load Model and select the GGUF file.

Describe Your App

In the AI chat panel, type a prompt like:
"Create a React + TypeScript todo app with Tailwind CSS. Include add, delete, toggle complete, and filter functionality. Use localStorage for persistence."

Review & Accept

The agent will create multiple files (index.html, App.tsx, package.json, etc.). Review each in the diff viewer, then click Accept All to apply.

Run & Iterate

Open the terminal (Ctrl+`), run npm install && npm run dev. Use Live Preview to see your app. Ask the AI to refine: "Add dark mode support" or "Make it responsive".

Agent Workflows #

The autonomous agent uses a ReAct (Reason → Act → Observe) loop. Here's how to leverage it effectively.

Prompt Engineering for the Agent

Pattern	Example	Why It Works
Be Specific	"Create an Express.js REST API with /users and /posts endpoints, using TypeScript and Zod validation"	Reduces ambiguity, fewer iterations
Use Context	"@file src/App.tsx — refactor this component to use React Query instead of useEffect"	Agent sees exact code to modify
Chain Tasks	"First create the database schema, then build the API, then write tests"	Agent plans a multi-step sequence
Constrain Output	"Fix the bug in @file utils.ts — only modify that file, don't create new files"	Prevents unwanted side effects

Multi-File Editing

When the agent modifies multiple files, you get a unified diff panel showing all changes. Best practices:

Review the file tree on the left — click any file to jump to its diff
Accept/Reject per-file or use bulk actions
The agent tracks its own progress in a todo list visible in the UI
If something looks wrong, type "undo that last change" in chat

Pro Tip: Agent Loops

For complex tasks, ask the agent to create a plan first: "Plan out how you'd build a full-stack authentication system — don't write code yet." Review the plan, then say "Execute the plan." This gives you control over the approach before any code is written.

Choosing the Right Model #

Different tasks call for different models. Here's a decision matrix based on your hardware and use case:

Scenario	Recommended Model	VRAM Needed	Speed
General coding (auto-complete, chat)	Qwen 2.5 Coder 7B Q4_K_M	4 GB	Fast
Complex reasoning / planning	DeepSeek-R1 7B Q4_K_M	4 GB	Fast
Low-resource machine (laptop)	Phi-3 Mini 3.8B Q4_K_M	2 GB	Very fast
Maximum quality (8+ GB VRAM)	CodeLlama 13B Q4_K_M	8 GB	Moderate
Long-context projects	Llama 3.3 8B Q4_K_M	6 GB	Fast
Embedding / RAG	nomic-embed-text-v1.5	1 GB	N/A

Quantization Guide

GGUF models come in various quantization levels. Here's what they mean:

Quantization	Quality	Size vs Full	Best For
`Q8_0`	Near-lossless	~50%	Maximum quality, plenty of VRAM
`Q6_K`	Excellent	~42%	Best quality/size tradeoff
`Q5_K_M`	Very good	~35%	Good quality, moderate VRAM
`Q4_K_M`	Good	~28%	Recommended default — sweet spot
`Q3_K_M`	Acceptable	~22%	Low VRAM, still usable quality
`Q2_K`	Degraded	~15%	Absolute minimum — last resort

GPU Optimization Guide #

Get the best inference performance from your NVIDIA GPU.

Prerequisites

NVIDIA GPU with CUDA Compute Capability 5.0+ (GTX 900 series or newer)
Latest NVIDIA drivers (Game Ready or Studio)
MBS Workbench detects CUDA automatically — no toolkit installation needed

VRAM Budget Planning

Your GPU VRAM determines what models you can run and how fast. Here's how VRAM is allocated:

Component	VRAM Usage	Notes
Model Weights	Model file size	A 4.4 GB Q4_K_M uses ~4.4 GB VRAM when fully offloaded
KV Cache	~200 MB per 4K context	Scales with context window size
OS / Desktop	~500 MB – 1 GB	Windows reserves VRAM for the display server
Buffer	~200 MB	Working memory for matrix operations

Rule of Thumb

Available VRAM = Total VRAM − 1 GB (OS overhead). For an RTX 4060 (8 GB), you have ~7 GB useable. A 7B Q4_K_M model (~4.4 GB) leaves ~2.5 GB for context, which gives you ~8K tokens comfortably.

Performance Tuning Tips

Batch Size

Larger batch sizes increase throughput but use more VRAM. MBS auto-calculates optimal batch size. Manual override in Parameters → Advanced.

Thread Count

CPU threads for non-GPU layers. Default is n_cpu_cores / 2. Increase for pure CPU inference; keep default for GPU-dominant setups.

Custom MCP Servers #

Extend MBS Workbench's capabilities by adding your own MCP servers. MCP servers expose tools to the AI model via JSON-RPC 2.0 over stdio.

Creating a Custom Server

// my-mcp-server.js — Minimal MCP server example
const { Server } = require('@modelcontextprotocol/sdk/server');
const { StdioServerTransport } = require('@modelcontextprotocol/sdk/server/stdio');

const server = new Server({ name: 'my-custom-tools', version: '1.0.0' }, {
  capabilities: { tools: { listChanged: false } }
});

server.setRequestHandler('tools/list', async () => ({
  tools: [{
    name: 'get_weather',
    description: 'Get weather for a city',
    inputSchema: {
      type: 'object',
      properties: { city: { type: 'string', description: 'City name' } },
      required: ['city']
    }
  }]
}));

server.setRequestHandler('tools/call', async (request) => {
  if (request.params.name === 'get_weather') {
    const city = request.params.arguments.city;
    return { content: [{ type: 'text', text: `Weather in ${city}: 72°F, sunny` }] };
  }
});

const transport = new StdioServerTransport();
server.connect(transport);

Registering Your Server

Open MCP Manager (Ctrl+Shift+M), click Add Server, and configure:

Field	Value
Name	`my-custom-tools`
Command	`node`
Args	`["./my-mcp-server.js"]`
Transport	`stdio`
Auto-Start	Optional — start on app launch

Docker Deployment #

Deploy your project using Docker directly from MBS Workbench.

Generate a Dockerfile

Open Deploy → Docker/K8s. Click Generate Dockerfile. Select your framework (Node, Python, Rust, Go, Java, or Static). MBS generates an optimized multi-stage Dockerfile.

Build the Image

Click Build Image or run docker build -t myapp:latest . in the terminal. MBS shows real-time build progress with expandable layer details.

Run the Container

Click Run from the image list or use docker run -p 3000:3000 myapp:latest. Container logs stream in the Docker panel.

Manage & Monitor

Use the Docker dashboard to view running containers, inspect logs, exec into shells, and stop/restart containers — all without leaving the IDE.

Building a RAG Pipeline #

Use Retrieval-Augmented Generation (RAG) to give your AI model access to your project's documentation, codebase, or any document corpus.

Load an Embedding Model

Download an embedding model like nomic-embed-text-v1.5 from HuggingFace. Load it in the Embeddings panel.

Index Your Documents

Open Document Chat from the Activity Bar. Drag and drop files (PDF, TXT, MD, code files) into the panel. MBS chunks, embeds, and stores them in a local vector database.

Query with Context

Ask questions like "What does the authentication system do?" or "Find all API endpoints related to billing." The RAG pipeline retrieves the most relevant chunks and injects them into the LLM's context window.

Use @codebase in Chat

In the main AI chat, type @codebase to automatically search your indexed project. The agent receives semantic search results as context, enabling codebase-aware answers.

RAG Storage

All embeddings are stored locally in a SQLite-backed vector database. Nothing is sent to external services. Your indexed documents stay private on your machine.

API Reference #

Complete reference for MBS Workbench's internal APIs, Tauri commands, agent tools, and configuration schemas.

Tauri Commands (IPC) #

All communication between the React frontend and Rust backend happens via Tauri's IPC bridge. Each command is invoked with invoke('command_name', { args }).

LLM & Inference

Command	Arguments	Returns	Description
`load_model`	`path: string, gpu_layers?: number`	`ModelInfo`	Load a GGUF model into memory with optional GPU layer override
`unload_model`	—	`void`	Release the loaded model from memory
`chat_completions`	`messages: Message[], params: InferenceParams`	`Stream<string>`	Stream chat completions (SSE) from the loaded model
`cancel_inference`	—	`void`	Abort the current inference stream
`get_model_info`	—	`ModelInfo \| null`	Returns metadata about the currently loaded model
`detect_hardware`	—	`HardwareInfo`	Returns GPU, RAM, CPU details for the current system

File System

Command	Arguments	Returns	Description
`read_dir`	`path: string, recursive?: boolean`	`FileEntry[]`	List directory contents with metadata
`read_file_text`	`path: string`	`string`	Read file contents as UTF-8 text
`write_file`	`path: string, content: string`	`void`	Write contents to a file (create or overwrite)
`delete_path`	`path: string`	`void`	Delete a file or directory
`rename_path`	`from: string, to: string`	`void`	Rename or move a file/directory
`search_files`	`query: string, path: string, regex?: boolean`	`SearchResult[]`	Search file contents with optional regex

Project & Workspace

Command	Arguments	Returns	Description
`open_project`	`path: string`	`ProjectInfo`	Open a folder as workspace, index files
`get_project_info`	—	`ProjectInfo`	Get current workspace path, file count, watchers
`run_terminal_command`	`command: string, cwd?: string`	`CommandResult`	Execute a shell command and return stdout/stderr

Agent Tool API #

The autonomous agent has access to 23 tools organized into 7 categories. Each tool follows a strict JSON schema for input/output.

Tool Definitions

Tool	Category	Safety Tier	Input Schema
`read_file`	Filesystem	Safe (1)	`{ path: string }`
`create_file`	Filesystem	Safe (1)	`{ path: string, content: string }`
`edit_file`	Filesystem	Elevated (2)	`{ path: string, old: string, new: string }`
`delete_file`	Filesystem	Dangerous (3)	`{ path: string }`
`list_directory`	Filesystem	Safe (1)	`{ path: string, recursive?: boolean }`
`search_files`	Filesystem	Safe (1)	`{ pattern: string, path?: string }`
`batch_read`	Filesystem	Safe (1)	`{ paths: string[] }`
`execute_command`	Terminal	Elevated (2)	`{ command: string, cwd?: string }`
`run_python`	Terminal	Elevated (2)	`{ code: string }`
`run_node`	Terminal	Elevated (2)	`{ code: string }`
`web_search`	Web	Dangerous (3)	`{ query: string, count?: number }`
`fetch_url`	Web	Dangerous (3)	`{ url: string }`
`scrape_webpage`	Web	Dangerous (3)	`{ url: string }`
`git_status`	Git	Safe (1)	`{ path?: string }`
`git_diff`	Git	Safe (1)	`{ path?: string }`
`git_commit`	Git	Elevated (2)	`{ message: string }`
`git_push`	Git	Dangerous (3)	`{ remote?: string, branch?: string }`
`git_log`	Git	Safe (1)	`{ count?: number }`
`analyze_code`	Analysis	Elevated (2)	`{ path: string }`
`explain_code`	Analysis	Safe (1)	`{ code: string, language?: string }`
`suggest_refactor`	Analysis	Safe (1)	`{ path: string }`
`query_sqlite`	Database	Elevated (2)	`{ db_path: string, query: string }`
`parse_csv`	Database	Safe (1)	`{ path: string }`

MCP Protocol #

The Model Context Protocol uses JSON-RPC 2.0 over stdio for communication between MBS and MCP servers.

Lifecycle

MBS Workbench             MCP Server
     │                         │
     │──── initialize ────────►│   Server starts, reports capabilities
     │◄─── initialized ────────│
     │                         │
     │──── tools/list ────────►│   List available tools
     │◄─── tools[] ────────────│
     │                         │
     │──── tools/call ────────►│   Execute a tool with arguments
     │◄─── result ─────────────│
     │                         │
     │──── shutdown ──────────►│   Graceful shutdown
     │◄─── ok ─────────────────│

Server Configuration Schema

{
  "name": "server-name",         // Unique identifier
  "command": "node",             // Executable to run
  "args": ["./server.js"],       // Command arguments
  "env": { "API_KEY": "..." },   // Environment variables (optional)
  "transport": "stdio",          // Transport type (stdio only currently)
  "auto_start": false,           // Start on app launch
  "health_check_interval": 30,   // Seconds between health pings
  "restart_on_crash": true,      // Auto-restart on server crash
  "max_restarts": 3              // Maximum restart attempts
}

Pre-Configured Servers (24 total)

Domain	Servers	Install Command
Dev Tools	Rust Analyzer, Pyright, TypeScript, Clangd, Go, TexLab	Auto-detected from PATH
Blockchain	Solana, Ethereum, CoinMarketCap, DEX Screener	`npx @mcp/solana`
Game Engines	Godot, Unity, Unreal	`npx @mcp/godot`
Web Dev	Puppeteer, Vercel, Docker	`npx @mcp/puppeteer`
Databases	PostgreSQL, SQLite, Redis	`npx @mcp/postgres`
Media	ComfyUI, FFmpeg, ImageMagick	`npx @mcp/comfyui`
Mobile	Flutter, Android	`npx @mcp/flutter`
Utilities	Filesystem, Terminal, Git, HTTP	Built-in (no install)

Settings Schema #

All app settings are stored locally and accessible via Ctrl+,. Here's the complete schema:

Inference Settings

Key	Type	Default	Description
`temperature`	`number`	`0.7`	Sampling temperature (0.0 – 2.0)
`top_p`	`number`	`0.9`	Nucleus sampling threshold
`top_k`	`number`	`40`	Top-K sampling limit
`repeat_penalty`	`number`	`1.1`	Repetition penalty factor
`max_tokens`	`number`	`4096`	Maximum generation length
`context_window`	`number`	`auto`	Context window size (auto = 75% GPU capacity)
`gpu_layers`	`number`	`auto`	Number of layers to offload to GPU
`batch_size`	`number`	`512`	Prompt processing batch size
`threads`	`number`	`auto`	CPU threads for inference

Appearance Settings

Key	Type	Default	Description
`theme`	`string`	`"dark"`	Theme mode: `"dark"`, `"light"`, `"high-contrast"`
`font_family`	`string`	`"system-ui"`	UI font family
`font_size`	`number`	`13`	UI font size in pixels
`editor_font_family`	`string`	`"Cascadia Code"`	Editor font family
`editor_font_size`	`number`	`14`	Editor font size
`editor_line_height`	`number`	`1.6`	Editor line height multiplier
`ui_scale`	`number`	`1.0`	Global UI scale factor
`minimap`	`boolean`	`true`	Show editor minimap

Cloud Provider Settings

Key	Type	Default	Description
`openai_api_key`	`string`	`""`	OpenAI API key
`anthropic_api_key`	`string`	`""`	Anthropic API key
`google_api_key`	`string`	`""`	Google Gemini API key
`default_provider`	`string`	`"local"`	Default inference provider

Complete Keyboard Shortcuts #

Full list of all keyboard shortcuts in MBS Workbench.

General

Command PaletteCtrl+Shift+P

Toggle SidebarCtrl+B

Toggle TerminalCtrl+`

Zen ModeCtrl+K Z

New FileCtrl+N

Open FolderCtrl+Shift+O

Save FileCtrl+S

Close TabCtrl+W

SettingsCtrl+,

Reload AppCtrl+Shift+R

Navigation

Go to FileCtrl+P

Go to LineCtrl+G

Go to SymbolCtrl+Shift+G

Next TabCtrl+Tab

Previous TabCtrl+Shift+Tab

Move Line UpAlt+Up

Move Line DownAlt+Down

Duplicate LineShift+Alt+Down

Activity Bar Panels

ExplorerCtrl+Shift+E

SearchCtrl+Shift+F

AI & ModelsCtrl+Shift+A

ConnectionsCtrl+Shift+N

Debug & TestCtrl+Shift+D

DeployCtrl+Shift+Y

ExtensionsCtrl+Shift+X

MCP ManagerCtrl+Shift+M

Editor

Find in FileCtrl+F

ReplaceCtrl+H

Find in WorkspaceCtrl+Shift+F

Format DocumentShift+Alt+F

Toggle CommentCtrl+/

Block CommentShift+Alt+A

Select All OccurrencesCtrl+Shift+L

Multi-CursorAlt+Click

AI & Chat

Focus ChatCtrl+L

New ChatCtrl+Shift+L

Cancel GenerationEscape

Accept CompletionTab

Insert @fileType @

Send MessageEnter

CLI Reference #

MBS Workbench can be launched from the command line with optional arguments:

# Open MBS Workbench
mbs-workbench

# Open a specific folder
mbs-workbench /path/to/project

# Open a specific file
mbs-workbench /path/to/file.ts

# Start with a specific model loaded
mbs-workbench --model /path/to/model.gguf

# Start in CPU-only mode (skip GPU detection)
mbs-workbench --cpu-only

# Show version
mbs-workbench --version

Design Agent #

The Design Agent is a local AI-powered design assistant that generates complete page layouts, component designs, and brand-consistent UI from natural language prompts — all running on your GPU with no cloud dependency.

Natural Language Design

Describe what you want — "a dark-themed pricing page with 3 tiers" — and the Design Agent generates production-ready HTML/CSS/Tailwind output using your local LLM.

Brand-Aware Generation

The agent reads your brand tokens (colors, fonts, spacing) and weaves them into every design, ensuring brand consistency across all generated components.

Dynamic Token Budgets

Automatically calculates max output tokens from your model's context window (n_ctx()), so large designs don't get truncated.

Compressed Preambles

Brand context is injected as compact token-efficient preambles, leaving maximum context for the actual design generation.

Visual Canvas #

A full drag-and-drop visual editor integrated directly into the IDE. Build pages visually with real-time preview, then export clean HTML/CSS/React code.

Drag-and-Drop Builder

Place components on a 2D canvas with snap-to-grid alignment, resize handles, and layer ordering. No code required for layout design.

Live Code Sync

Every visual change syncs to clean, editable code in real-time. Switch between visual and code views seamlessly.

Responsive Breakpoints

Preview and design for desktop, tablet, and mobile breakpoints. The canvas adapts to show your design at each viewport size.

Component Library

Built-in library of common UI components — navbars, hero sections, cards, footers, forms, modals — ready to drag onto your canvas.

Brand & Tokens #

Define your brand identity once and use it everywhere. Brand tokens (colors, typography, spacing, border-radius) are injected into the Design Agent, Visual Canvas, Template Marketplace, and AI Copywriter for consistent output.

Token Editor

Visual editor for brand tokens — pick colors, set font stacks, define spacing scales, configure border-radius and shadow presets.

Export Formats

Export tokens as CSS custom properties, Tailwind config, SCSS variables, or JSON for use in any project.

Template Marketplace #

Browse and install 15 unique, production-ready templates spanning landing pages, dashboards, portfolios, e-commerce, blogs, and more. Each template ships in 8 color variants (120 total combinations) and adapts to your brand tokens automatically.

15 Unique Templates

SaaS Landing, Portfolio, Blog, E-Commerce, Dashboard, Documentation, Agency, Restaurant, Fitness, Real Estate, Event, Education, Medical, Travel, and Startup — each professionally designed.

8 Color Variants

Every template includes 8 curated color schemes: Default, Ocean, Sunset, Forest, Royal, Coral, Midnight, and Amber. One click to switch.

Brand Token Integration

Templates automatically adopt your brand tokens — colors, fonts, and spacing — so the output matches your brand from the start.

One-Click Install

Preview any template in the built-in live preview, then install directly into your project with clean, editable HTML/CSS/React code.

AI Copywriter #

Generate marketing copy, product descriptions, CTAs, headlines, and page content using your local LLM. The AI Copywriter respects your brand tone and outputs clean, usable text with a 512-token generation limit for concise, focused copy.

Copy Templates

Pre-built prompts for hero headlines, feature descriptions, testimonials, pricing blurbs, email subject lines, and social media posts.

Tone Control

Select tone presets — Professional, Casual, Technical, Playful, Luxury — or define a custom brand voice for consistent messaging.

SEO & Performance Analyzer #

Built-in SEO and performance analysis for your pages and projects. Checks meta tags, heading structure, image alt text, page speed metrics, and accessibility compliance — all locally, no third-party services.

Asset Manager #

Organize, optimize, and manage images, icons, fonts, and other assets directly in the IDE. Drag-and-drop upload, automatic image compression, SVG optimization, and sprite generation.

Voice Studio (TTS) #

A full text-to-speech studio built into the IDE. Synthesize speech locally using Kokoro-82M ONNX (22 neural voices, no internet required) or Windows SAPI as a zero-install fallback. Audio is decoded directly in the browser using a Blob URL — no external protocol configuration needed.

Kokoro ONNX Engine

22 neural voice profiles running on-device via the Kokoro-82M model. Download the model once (~82 MB) and get high-quality, expressive speech synthesis with zero latency after the first warmup. Speed and pitch controls included.

Windows SAPI Fallback

If the Kokoro model is not installed, Workbench automatically falls back to Windows SAPI via PowerShell — no setup required. Any Windows voice installed on the system is available instantly.

In-IDE Audio Playback

Generated WAV audio is read as binary, decoded to a Blob URL, and played back in-app without any server or file protocol workarounds. Preview, replay, and download audio files directly from Voice Studio.

ONNX Status Dashboard

The Voice Studio panel shows the Kokoro model install status, runtime availability, model size, recommended action, and direct download link — so you always know what's available on your machine.

Mobile Export #

Export your web projects to native mobile apps with one click. MBS supports Capacitor (iOS/Android native), PWA (Progressive Web App), and APK (Android Package) export targets. Projects export to Documents/MBS-Mobile-Exports.

Capacitor Export

Generate a full Capacitor project with native iOS and Android wrappers. Ready for Xcode or Android Studio.

PWA Export

Generate a Progressive Web App with service worker, manifest, offline support, and installability — works on any device.

APK Build

Build a standalone Android APK directly from the IDE. No need to install Android Studio separately.

Interactive SDK #

Build interactive forms, surveys, quizzes, and multi-step workflows with a visual form builder. The SDK supports conditional logic, field validation, file uploads, and webhook integrations — all rendered client-side with zero backend required.

Visual Form Builder

Drag-and-drop form fields — text inputs, selects, checkboxes, file uploads, date pickers, sliders — with real-time preview.

Conditional Logic

Show/hide fields, skip steps, or change validation based on user responses. Build complex multi-step flows without code.

Embed & Export

Export forms as standalone HTML, React components, or embed via iframe. Each form gets a unique formId for tracking.

Architecture #

MBS Workbench is built on Tauri + Rust + React — a modern architecture that delivers near-native performance in a compact, self-contained native app.

┌──────────────────────────────────────────────────────────────┐ │ MBS WORKBENCH v0.2.4 │ │ 158+ backend modules · 191 components │ ├──────────────────────────────────────────────────────────────┤ │ React 18 + TypeScript Frontend (191 components) │ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ Monaco │ AI Chat │ Terminal │ Activity Bar + Panels ││ │ │ Editor │ (Stream) │ (xterm) │ (13 sidebar panels) ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ │ Tauri IPC Bridge │ │ Rust Backend (158+ modules, in-process, zero-config) │ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ LLM │ Diffusion│ Training │ Inference ││ │ │ Engine │ Engine │ Pipeline │ Optimizer ││ │ │ (CUDA) │ (CUDA) │ (LoRA) │ (Auto-Quant) ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ ReAct │ MCP │ Docker / │ Cloud Providers ││ │ │ Agent │ Protocol │ K8s Mgmt │ (10 APIs) ││ │ │(23 tools)│(24 srvrs)│ │ ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ LSP/DAP │ Git2-rs │ Cloud │ Model Export ││ │ │ Client │ Native │ Deploy │ (ONNX/CoreML/TRT) ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ └──────────────────────────────────────────────────────────────┘ ↓ Direct GPU Access (no Docker, no server) ┌──────────────────────────────┐ │ NVIDIA CUDA / CPU │ │ (User's own hardware) │ └──────────────────────────────┘

Technology Stack

Component	Technology	Why
Desktop Framework	Tauri 1.5 (Rust)	600 KB runtime vs Electron's 100 MB. Native Windows/Mac/Linux.
LLM Inference	Native Rust engine with CUDA bindings	Fastest GGUF inference. Direct CUDA. In-process — no server.
Image Generation	Native diffusion engine (C++/CUDA)	GPU-accelerated diffusion. SD 1.5, SDXL, FLUX. In-process — no Python.
Model Training	Custom LoRA/QLoRA pipeline	Hardware-aware training with advanced optimization. Consumer GPU support.
Frontend	React 18 + TypeScript	Industry standard. Monaco Editor compatibility. Component ecosystem.
Styling	TailwindCSS + CSS Custom Props	Utility-first with design token system. 100+ variables.
State Management	Zustand	Lightweight, TypeScript-native, minimal boilerplate.
Local Storage	rusqlite (SQLite)	Embedded database for settings, history, model metadata.
Terminal	xterm.js + PTY	Full terminal emulation with pseudo-terminal backend.
Preview Server	Warp (Rust)	Lightweight HTTP server for real-time web preview.

Technical Moat

The entire backend — inference, diffusion engine, training pipeline, agent loop, tool execution, context management, MCP protocol, Docker/K8s integration, cloud deployment — is written in Rust. This provides significantly lower latency than Python-based alternatives and creates a codebase that is extremely difficult to replicate. Zero marginal cost per user — the user's hardware does all the work.

Performance Tuning #

Inference Optimizer (Zero Configuration)

MBS Workbench includes a built-in inference optimizer that automatically configures every loaded model for maximum speed on your hardware. No manual tuning required.

Auto-Quantization

Detects available VRAM and selects the optimal quantization level (Q4_K_M for <6 GB, Q5_K_M for 6-10 GB, Q8_0 for 10-16 GB, FP16 for 16+ GB). Cached conversions skip re-quantization on subsequent loads.

Intelligent Layer Partitioning

Automatically calculates how many layers fit in GPU VRAM and offloads the rest to CPU with pinned memory. Enables 30B+ models on 4 GB GPUs with mixed GPU/CPU inference.

Flash Attention & KV Cache

Flash Attention v3 enabled by default on supported GPUs (2-3× faster). 8-bit KV cache reduces attention memory by 75%, enabling 256K+ context windows on consumer hardware.

Model-Specific Presets

Hand-tuned optimization presets for popular model architectures. Auto-detected on load. MoE expert parallelism, vision-language encoder splitting, and diffusion-specific memory management.

Expected Performance

On a system with 4 GB VRAM + 64 GB RAM: 7B models at 40-70 tok/s, 30B MoE models at 25-40 tok/s, and image generation 30-40% faster than unoptimized loading. The optimizer delivers 10-20× speedup compared to naive FP16 inference.

Hardware-Aware Scaling

At launch, MBS detects your exact hardware configuration and scales every subsystem:

System Tier	RAM	GPU	Optimized For
Low	8 GB	0-2 GB VRAM	2-4B models, CPU inference, minimal context
Medium	16 GB	4 GB VRAM	7B models, partial GPU offload, 4K context
High	32 GB	8 GB VRAM	13B models, full GPU offload, 8K context
Ultra	64 GB+	12 GB+ VRAM	30B+ models, maximum context, batch decode

Memory Architecture

Three-tier memory system ensures optimal performance:

L1 — Hot (2 GB)

Active conversation context, current file buffer, model attention cache

L2 — Warm (8 GB)

Conversation history, recent files (last 10), KV cache, project embeddings

L3 — Cold (SSD)

Model weights (memory-mapped), project history, vector database, preferences

Privacy & Security #

Your data never leaves your machine. Ever.

MBS Workbench runs 100% locally. There is no telemetry, no analytics, no crash reporting, no account system, and no phone-home mechanism. The binary communicates with exactly two external services (both opt-in): HuggingFace for model downloads, and cloud providers if you configure API keys.

No account required — download, install, use. Zero signup friction.
No telemetry — we don't collect usage data, crash reports, or metrics.
Offline-capable — once a model is downloaded, everything works without internet.
Agent safety tiers — three-level permission system prevents unintended side effects.
Local-only storage — settings, history, and model metadata stored in local SQLite.
Open model ecosystem — use any GGUF model from any source. No vendor lock-in.

How MBS Compares #

Feature	MBS Workbench	GitHub Copilot	Cursor	LM Studio
Pricing	Free / One-time	$10-39/mo	$20/mo	Free
Runs Locally	✓ Full	✗ Cloud	✗ Cloud	✓ Inference
100% Private	✓	✗	✗	✓
Code Editor	✓ Monaco	✓ VS Code ext	✓ Fork	✗ None
Autonomous Agent	✓ 23 tools	✗	Partial	✗
GPU Acceleration	✓ Native CUDA	N/A	N/A	✓
Model Choice	✓ Any GGUF	GPT-4 only	GPT-4/Claude	✓ Any GGUF
Source Control (Git)	✓ Native git2-rs	✓ Built-in	✓ Built-in	✗
Debug (DAP)	✓ 9 adapters	✓ Built-in	✓ Built-in	✗
Multi-Model Chat	✓ Tournament	✗	✗	✗
Voice Input (STT)	✓ Web Speech API	✗	✗	✗
Text-to-Speech (TTS)	✓ Kokoro + SAPI	✗	✗	✗
Episodic Memory	✓ SQLite + vectors	✗	✗	✗
Conversation Branching	✓ Git-tree UI	✗	✗	✗
Telegram Bot Mode	✓ Local polling	✗	✗	✗
Image Generation	✓ SD local	✗	✗	✗
Model Training	✓ LoRA/QLoRA	✗	✗	✗
HuggingFace Browser	✓ Built-in	✗	✗	Partial
MCP Protocol	✓ 24 servers	✗	Partial	✗
Docker / K8s	✓ Full mgmt	✗	✗	✗
Multi-Cloud Deploy	✓ 5 providers	✗	✗	✗
Cost Analytics	✓	✗	✗	✗
Offline Mode	✓ Full	✗	✗	✓
App Size	~175 MB	N/A	~400 MB	~200 MB

Competitive Advantage

MBS Workbench is the only product that combines local LLM inference + code editor + autonomous agent + MCP servers + model marketplace + image generation + on-device model training + Cloud GPU Hub + container management + multi-cloud deployment + native Git + DAP debugger + Voice Studio (TTS + STT) + Episodic Memory + Conversation Branching + Telegram Bot Mode + 10 cloud API providers + inference optimizer + multi-model tournament in a single binary. Competitors address 1–2 of these capabilities. We address all of them — at zero recurring cost to the user. 70 major feature releases. 250%+ AI capability vs cloud tools.

Frequently Asked Questions #

Is MBS Workbench free?

The core application is free. Future Pro features (advanced agent workflows, enterprise MCP servers, priority model access) may be offered as a one-time license — never a subscription.

What models work with MBS?

Any GGUF-format language model — Qwen, DeepSeek, Llama, Mistral, Phi, Gemma, CodeLlama, and thousands more from HuggingFace. For image generation, MBS supports Stable Diffusion 1.5, SDXL, and FLUX models. Plus, 10 cloud API providers (OpenAI, Anthropic, Google, etc.) are built in for hybrid workflows.

Do I need a GPU?

No. MBS works in CPU-only mode — just slower. For the best experience, an NVIDIA GPU with 4+ GB VRAM is recommended. Even a GTX 1660 (6 GB) provides a great experience with 7B models.

Does it send my code to the cloud?

Never. When using local models, your code and prompts never leave your machine. Cloud providers (OpenAI, Anthropic, etc.) are opt-in and clearly labeled in the UI.

How does it compare to Copilot/Cursor?

Copilot and Cursor are cloud-first tools with recurring subscriptions. MBS runs locally, costs nothing per month, and gives you complete model freedom. See the full comparison table.

Can I use it offline?

Yes. Once you've downloaded a model, everything works without internet — editing, AI completions, chat, agent, image generation, model training, terminal, and all development tools.

What about Mac and Linux?

Windows is the primary platform today. Mac and Linux builds are planned — Tauri natively supports all three platforms.