Build with AI.
Locally. Privately. Yours.
MBS Workbench is a GPU-accelerated AI development environment that runs entirely on your machine. Local inference, image generation, model training, 10 cloud API providers, and a full code editor — no subscriptions, no cloud, no data leaves your device.
Download & Install #
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 (64-bit) | Windows 11 |
| RAM | 8 GB | 16 GB+ |
| GPU | CPU-only mode available | NVIDIA GPU with 4 GB+ VRAM |
| Storage | 500 MB (app) | 10 GB+ (app + models) |
| CUDA | — | CUDA 12.x for GPU acceleration |
Installation Methods
NSIS Installer
One-click Windows installer with start menu shortcuts and uninstaller. Recommended for most users.
MBS Workbench_x64-setup.exe
MSI Installer
Enterprise-grade MSI package for Group Policy and SCCM deployment across organizations.
MBS Workbench_x64_en-US.msi
Portable EXE
Standalone executable — no installation required. Run from USB drives or restricted environments.
MBS Workbench.exe (134 MB)
Quick Start #
Get productive in under 5 minutes. Here's everything you need to go from install to your first AI-assisted coding session.
Download & Launch
Download the installer from the releases page and run it. MBS Workbench launches instantly — no account creation, no sign-in, no telemetry.
Download an AI Model
Open the AI & Models panel from the Activity Bar (or press Ctrl+Shift+A). Click HuggingFace to browse 800,000+ models. We recommend starting with Qwen 2.5 Coder 7B Q4_K_M — it fits in 4 GB VRAM and delivers excellent code generation.
Load the Model
Go to Load Model and select your downloaded GGUF file. MBS automatically detects your GPU and optimizes layer offloading, context window, and batch size.
Open a Project
Press Ctrl+Shift+O or click File → Open Folder to open your workspace. The file explorer appears in the sidebar — click any file to start editing.
Start Coding with AI
The AI chat panel appears on the right. Type a prompt like "Create a REST API with Express.js and TypeScript" — the agent will create files, install dependencies, and set everything up. You can also use @file, @folder, or @codebase to inject context.
Code Editor #
MBS Workbench includes a full Monaco-powered code editor — the same engine that powers VS Code. You get an enterprise-grade editing experience without any external dependencies.
50+ Languages
Syntax highlighting for TypeScript, Python, Rust, Go, Java, C++, Solidity, and dozens more.
Multi-Tab Editor
Open multiple files in tabs. Drag to reorder. Modified files show a dot indicator.
Minimap & Breadcrumbs
Minimap overview of your file. Breadcrumb navigation shows your location in the code hierarchy.
Find & Replace
Powerful in-file search with regex support. Global workspace search with file filtering.
Split Editor
Split your editor into multiple panes. Compare files side by side or reference code while editing.
Code Folding
Collapse code blocks, bracket pair colorization, indentation guides, and sticky scroll headers.
AI Chat & Autonomous Agent #
MBS Workbench includes a 23-tool autonomous agent powered by a structured ReAct (Reason → Act → Observe) state machine — not brittle prompt chains. Give it a task in natural language and it will plan, execute, and iterate until it's done.
Agent Capabilities
| Category | Tools | What It Can Do |
|---|---|---|
| Filesystem | 7 | Create, read, edit, delete, search files, batch-read directories |
| Terminal | 3 | Run shell commands, execute Python & Node.js scripts |
| Web | 3 | Web search, fetch URLs, scrape webpages |
| Git | 5 | Status, commit, push, pull, branch management |
| Analysis | 3 | Static analysis, code explanation, refactoring suggestions |
| Database | 2 | SQLite queries, CSV parsing |
Safety System
A three-tier permission system protects you from unintended side effects:
Safe (Tier 1)
Read-only operations and file creation. read_file, create_file, list_directory — always allowed.
Elevated (Tier 2)
Execution and analysis tools. execute_command, git_commit, analyze_code — allowed after model qualification.
Dangerous (Tier 3)
Network and destructive operations. web_search, git_push, delete_file — restricted to verified models.
@-Context Injection
Inject precise context into any AI conversation by typing @ followed by a context source:
| Context | Description |
|---|---|
@file | Inject the contents of a specific file into the prompt |
@folder | Inject directory tree and file summaries |
@codebase | Inject indexed codebase context (symbols, definitions) |
@selection | Inject current editor selection |
@terminal | Inject last terminal output |
Multi-File AI Edits
The AI agent can propose changes across multiple files simultaneously. You get a unified diff viewer with:
- Side-by-side diff comparison
- Accept / Reject controls per file
- Accept All / Reject All bulk actions
- File tree showing all changed files
Model Management #
Loading Models
MBS Workbench supports any GGUF-format model — the industry standard for local LLM inference. Models are loaded directly into process memory with native CUDA GPU offloading.
Recommended Models
| Model | Size | Best For | Min. VRAM |
|---|---|---|---|
| Qwen 2.5 Coder 7B Q4_K_M | 4.4 GB | Code generation, best quality/size ratio | 4 GB |
| DeepSeek-R1 7B Q4_K_M | 4.5 GB | Reasoning & chain-of-thought | 4 GB |
| Llama 3.3 8B Q4_K_M | 5.0 GB | General purpose, chat | 6 GB |
| Phi-3 Mini Q4_K_M | 2.0 GB | Small, fast, low-resource machines | 2 GB |
| Mistral 7B Q4_K_M | 4.1 GB | Versatile, strong instruction following | 4 GB |
| CodeLlama 13B Q3_K_M | 5.5 GB | Complex code tasks (8 GB VRAM) | 8 GB |
GPU Acceleration (CUDA)
MBS Workbench embeds llama.cpp with native CUDA bindings directly in the Tauri binary. Unlike Ollama or LM Studio, there's no separate inference server — the model runs in the same process as the editor.
Partial GPU Offloading
If your GPU has limited VRAM, MBS automatically splits model layers between GPU and CPU. A 7B model on a 4 GB GPU offloads ~28 of 32 layers to GPU, keeping 4 on CPU.
Full GPU Offloading
With sufficient VRAM, all layers run on GPU for maximum speed. A 7B Q4 model fully offloaded to an RTX 4060 delivers 40+ tokens/sec.
Model Parameters
Fine-tune inference behavior from the Parameters panel:
| Parameter | Range | Description |
|---|---|---|
| Temperature | 0.0 – 2.0 | Controls randomness. Lower = more deterministic, higher = more creative. |
| Top-P | 0.0 – 1.0 | Nucleus sampling — limits token pool to cumulative probability threshold. |
| Top-K | 1 – 200 | Limits sampling to the top K most likely tokens. |
| Repeat Penalty | 1.0 – 2.0 | Penalizes repeated tokens to reduce repetitive output. |
| Max Tokens | 64 – 32768 | Maximum number of tokens to generate per response. |
| Context Window | 512 – 131072 | Total tokens the model can see (prompt + response). Auto-sized to 75% GPU capacity. |
Cloud Providers (10 APIs)
While MBS Workbench is built for local inference, you can connect to 10 cloud LLM providers for hybrid workflows. Mix local and cloud models in the same session with real-time cost tracking.
Configure API keys in Settings → Cloud Providers. The unified provider selector in chat lets you switch between local and cloud models per-message. An OpenAI-compatible fallback supports any additional provider.
Speculative Decoding
Accelerate inference by pairing a large model with a smaller draft model. The draft model generates candidate tokens that the main model verifies in parallel — delivering 2-3x speedup on compatible hardware.
Embeddings & Vision
Load embedding models for local RAG (Retrieval-Augmented Generation) and semantic search. Vision model support (LLaVA, Qwen-VL) lets you paste images directly in chat for screenshot-to-code workflows and image analysis.
HuggingFace Explorer #
Browse and download from HuggingFace's 800,000+ model library without leaving the app. MBS Workbench scores every model against your hardware for instant compatibility assessment.
12 Task Categories
Chat, Code, Image Gen, WebDev, Game Dev, Agent, Reasoning, Vision, Embedding, Translation, Summarization, Math
Smart Scoring
Models ranked by downloads × likes × quantization quality × hardware fit. Badges: Top 5, Popular, Trusted.
One-Click Download
Streaming download with real-time bytes/sec, ETA, and progress. Auto-detect GGUF variants and quantization levels.
Hardware Matching
Each model scored: Perfect, Good, Possible, or Too Large for your system.
Inline AI Completions #
Get real-time AI-powered code suggestions as you type — similar to GitHub Copilot, but running entirely on your local GPU with zero latency and complete privacy.
- Ghost text appears as you type (400ms debounce)
- Press Tab to accept, Esc to dismiss
- Context-aware: reads surrounding code for accurate suggestions
- Toggle on/off from the editor toolbar
- Works with any loaded model — no separate "completion model" needed
Integrated Terminal #
A full terminal emulator built into the bottom panel — no need to switch between windows.
- xterm.js-powered terminal with PTY backend
- Auto-syncs working directory with your workspace
- Toggle with Ctrl+`
- Run commands, install dependencies, start servers
- Agent can execute commands through the terminal
MCP Servers #
MBS Workbench ships with full Model Context Protocol (MCP) infrastructure and 24 pre-configured server definitions spanning 8 domains. MCP enables your AI model to interact with external tools, APIs, and services through a standardized JSON-RPC 2.0 protocol.
| Domain | Servers | Capabilities |
|---|---|---|
| Dev Tools | Rust Analyzer, Pyright, TypeScript, Clangd, Go, TexLab | Language intelligence, diagnostics, completions |
| Blockchain | Solana, Ethereum, CoinMarketCap, DEX Screener | Smart contract interaction, price data, DEX analytics |
| Game Engines | Godot, Unity, Unreal | Scene management, asset creation, scripting |
| Web Dev | Puppeteer, Vercel, Docker | Browser automation, deployment, containerization |
| Databases | PostgreSQL, SQLite, Redis | Query execution, schema inspection, caching |
| Media | ComfyUI, FFmpeg, ImageMagick | Image generation, video processing, media conversion |
| Mobile | Flutter, Android | App building, emulator control, hot reload |
| Utilities | Filesystem, Terminal, Git, HTTP | File ops, command execution, version control, API calls |
Language Server Protocol (LSP) #
Built-in LSP client for professional-grade language intelligence:
- Real-time diagnostics (errors & warnings) in the editor
- Symbol extraction and navigation
- Document synchronization (open / change / save)
- Start/stop language servers from the UI
- Automatic server restart on crash
Extensions #
MBS Workbench ships with a built-in extension system:
Git
Python
Markdown
AI Completions
Live Preview
Mermaid
RAG
Image Gen
Search & Replace #
Powerful workspace-wide search with regex, case sensitivity, include/exclude glob filters, and bulk replace. Results are grouped by file with match highlighting. Access via Ctrl+Shift+F or the Search panel in the Activity Bar.
Live Preview #
Preview HTML, CSS, and JavaScript projects in a split pane without leaving the editor. The embedded Warp HTTP server provides instant hot-reload as you type. Toggle with View → Toggle Preview.
Debug & Test #
Full DAP (Debug Adapter Protocol) integration with 9 debug adapters — Node.js, Python, Rust, Go, C/C++, Java, .NET, PHP, and Ruby. Set breakpoints, step through code, inspect variables, evaluate expressions, and view call stacks — all inside MBS Workbench.
Breakpoints & Stepping
Line breakpoints, conditional breakpoints, step over/into/out, continue, restart, and stop. Inline variable values shown directly in the editor.
Compound Configs
Launch multiple debug targets simultaneously. Source map VLQ decode for debugging transpiled code. Launch.json editor with template generation.
Integrated test runner with support for npm test, cargo test, pytest, and jest. Task auto-detection discovers package.json scripts, Cargo targets, and Makefiles automatically. Test results display inline with pass/fail status.
Source Control (Git) #
Complete Git integration powered by git2-rs (native Rust bindings) and porcelain v2 commands. Full SCM sidebar with visual staging, inline blame, and merge conflict resolution.
SCM Sidebar
Stage/unstage files and individual hunks. Commit with message. Visual diff editor. Branch management (create, switch, delete, rename). Tag and stash support.
Advanced Git
Inline blame annotations. Git gutter decorations. Cherry-pick, rebase, and reflog. Merge conflict resolution with accept current/incoming/both. Partial (line-level) staging.
Push, pull, and fetch operations run asynchronously with progress notifications. SCM toolbar provides one-click access to commit, stage all, unstage all, pull, push, and branch switching.
AI Refactoring #
AI-powered code refactoring with real safety checks — the refactoring engine actually runs your test suite (npm test, cargo test, pytest) before and after each transformation to verify correctness.
Extract & Move
Extract method, extract component (React/Vue), move symbol to another file. AI suggests optimal extraction boundaries based on code analysis.
AI Rename & Patterns
Context-aware renaming across files. Regex-based pattern transforms. Full refactoring undo with one-click revert. Safety score shown before applying.
Multi-Model Chat #
Chat with multiple AI models simultaneously in a side-by-side view. Compare responses, run model tournaments, and track cost per query across providers.
- Simultaneous chat — send one prompt, get responses from 2-4 models at once
- Tournament mode — blind comparison where you pick the best response
- Cost analysis — real-time token cost tracking per provider (OpenAI, Anthropic, Google, local)
- Templates — save and reuse multi-model comparison setups
- Local + Cloud — mix local GGUF models with cloud APIs in the same session
Voice Input (STT) #
Live microphone speech-to-text powered by the browser's built-in Web Speech Recognition API — no external binary, no cloud. Continuous recording mode streams interim and final transcripts in real time. Use it to dictate code, chat messages, or search queries entirely on-device.
- One-click Start / Stop recording with visual indicator
- Language selection (auto-detect or manual BCP-47 code)
- Interim transcript streamed as you speak
- Full error messaging — microphone denied, no speech, audio capture failures
- No external binary required — works via WebView2 on all supported Windows versions
Image Generation #
Generate images from text prompts using Stable Diffusion running entirely on your GPU. Supports SD 1.5, SDXL, and FLUX architectures with automatic model detection and configuration. No cloud fees, no API limits — unlimited generation on your own hardware.
Text-to-Image
Enter a prompt, generate an image. Token weighting syntax for precise control. 20+ built-in prompt templates. Negative prompt presets. Multiple samplers (Euler, DPM++, LCM) with auto-selection.
img2img & Inpainting
Transform existing images with guided generation. Mask-based selective regeneration. ControlNet integration with 8 modes (Canny, Depth, Pose, Scribble, and more). Adjustable denoising strength.
Live Preview & History
Watch images evolve in real-time as denoising progresses. Full generation history with metadata. Re-generate any previous image with one click. Search by prompt, model, or date.
LoRA & Batch Generation
Stack multiple LoRA weights simultaneously with per-LoRA strength sliders. Smart batch mode with variation strategies (seed walk, CFG sweep, sampler comparison). Export batches as ZIP.
Model Training #
Fine-tune LoRA and QLoRA adapters on your own codebase or domain-specific data. Train specialized models that outperform generic models on your specific tasks — from 4 GB laptops to multi-GPU workstations.
Visual Training Dashboard
Real-time loss curves with moving averages. GPU/CPU memory gauges. Sample generation during training. Estimated time remaining. Pause/resume with one click. Auto-save checkpoints.
Hardware-Aware Presets
Smart hardware detection recommends optimal batch sizes, LoRA rank, and gradient accumulation. Four tiers: Quick (15 min), Balanced (1-2 hr), Quality (3-6 hr), and Professional (multi-GPU).
Advanced Training
Distributed training with auto-generated configs. CPU offloading for 30B+ models on consumer GPUs. Context extension (32K→256K+). Reinforcement learning support.
Export & Deploy
Export trained adapters as GGUF for immediate inference. Multiple quantization options. One-click upload to HuggingFace Hub. Agent trajectory dataset builder for creating tool-using coding agents.
- 6 use-case presets: Code Completion, Bug Fixer, Refactoring, Test Generator, Documentation, Full Agent
- Pre-flight checks validate CUDA, VRAM, disk space, and dataset before training starts
- Automatic OOM recovery — reduces batch size and enables gradient checkpointing on the fly
- Cloud GPU rental integration (one-click provisioning with cost monitoring and auto-shutoff)
Ollama-Compatible API Server #
MBS Workbench includes a built-in HTTP server with OpenAI-compatible and Ollama-compatible API endpoints. Point any tool that supports the OpenAI API format at your local MBS instance and get responses from your locally loaded model.
/v1/chat/completions— OpenAI-compatible chat endpoint/api/generate— Ollama-compatible generation endpoint- API key management for access control
- Server start/stop from the UI or command palette
Remote Development #
Develop on remote machines via real SSH connections (OpenSSH), run code inside Dev Containers (real docker build and docker run), and sync files via SCP.
SSH Remote
Save and manage SSH configurations. Connect to remote hosts. Browse and edit remote files. Open remote terminals. Run models on remote GPUs.
Dev Containers
Build and run Docker dev containers directly from the UI. Port forwarding. File sync between local and container. Remote LSP support via SSH command spawn.
Docker & Kubernetes #
Full container management without leaving your IDE. MBS Workbench integrates directly with Docker Engine and Kubernetes clusters.
Docker Dashboard
Container Management
List, start, stop, restart, remove containers. View logs and open exec shells. Docker Compose up/down/status.
Image & Build
Pull, build, remove images. Volume and network management. System prune and disk usage monitoring.
Kubernetes Explorer
Visual cluster browser with context/namespace management, pod inspection (logs, describe, YAML), service & deployment management, port-forwarding, and resource YAML apply/delete.
Compose → K8s Converter
Convert Docker Compose files into Kubernetes manifests with one click. Generates Deployments, Services, Ingress, PVCs, HPAs, and health checks. Includes a Dockerfile generator for 6 frameworks (Node, Python, Rust, Go, Java, Static).
Local Cluster Provisioning
Create and manage local Kubernetes clusters through Minikube, Kind, or k3d — directly from the UI.
Cloud GPU Hub #
Need more GPU power than your local hardware provides? The Cloud GPU Hub lets you rent cloud GPUs from leading providers — directly from within MBS Workbench.
Browse & Rent
Compare GPU instances across cloud providers with real-time pricing, availability, and specs. Launch instances with one click and connect them to your workspace.
Budget Controls
Set monthly spending limits, configure cost alerts, and enable auto-terminate to prevent cost overruns. Track spending across all providers in a unified dashboard.
Instance Management
Monitor running instances, view resource utilization, and manage lifecycle (start, stop, terminate) from the Activity Bar. No separate provider dashboards needed.
Integrated Workflows
Use rented GPUs for model training, inference, and image generation — the same features you use locally, running on cloud hardware when you need extra power.
Cloud Deployment #
Deploy to any major cloud provider with a unified interface:
| Provider | Services |
|---|---|
| Azure | Container Apps, Azure Functions |
| Google Cloud | Cloud Run, Cloud Functions |
| AWS | Lambda (create + update) |
| Vercel | Frontend & serverless deploy |
| Netlify | Static site & serverless deploy |
Each provider includes CLI detection, login management, deployment history, logs, and config template generation.
Model Export & Conversion #
Convert models to production formats for edge deployment:
Includes device-specific optimization, size estimation, cloud storage upload (S3/GCS/Azure), and inference code generation for Python, Swift, and Kotlin.
Cost Analytics #
Track and optimize your AI spending across all providers. Five-tab dashboard covering overview, cost breakdown by provider/model, pricing reference, budget alerts with thresholds, and local-vs-cloud ROI analysis. See exactly how much you save by running locally.
Keyboard Shortcuts #
MBS Workbench uses familiar VS Code shortcuts. Here are the most important ones:
General
Activity Bar Panels
Editor
Themes & Appearance #
MBS Workbench ships with three theme modes and a built-in theme editor:
Dark (Default)
Deep indigo-blue palette with glassmorphism effects. Easy on the eyes for long sessions.
Light
Clean white background with soft shadows. High readability in bright environments.
High Contrast
Maximum contrast for accessibility. Meets WCAG AAA standards.
The Theme Editor lets you customize colors, radii, shadows, and spacing through a visual JSON-based editor. Over 100 CSS custom properties (design tokens) control every visual element.
Command Palette #
Press Ctrl+Shift+P to open the Command Palette — a fuzzy-search overlay that indexes all 52+ commands across 7 categories (AI & Models, Connections, Debug & Test, Deploy & Cloud, Extensions, Navigation, Editor). Type to filter, arrow keys to navigate, Enter to execute.
Settings #
All configuration is managed through the Settings panel. Inference parameters, theme preferences, keyboard shortcuts, cloud API keys, and extension state are all accessible from one place. Settings persist across sessions via local storage.
Guides #
Step-by-step tutorials to get the most out of MBS Workbench. Each guide walks through a real-world workflow from start to finish.
Your First AI Project #
Build a complete web app from scratch using the AI agent — no prior experience needed.
Create a New Workspace
Press Ctrl+Shift+O and select an empty folder (e.g., my-first-app). This becomes your project root.
Load a Model
Open the AI & Models panel (Ctrl+Shift+A). If you haven't downloaded a model yet, click HuggingFace, search for Qwen 2.5 Coder 7B Q4_K_M, and download it. Then switch to Load Model and select the GGUF file.
Describe Your App
In the AI chat panel, type a prompt like:
"Create a React + TypeScript todo app with Tailwind CSS. Include add, delete, toggle complete, and filter functionality. Use localStorage for persistence."
Review & Accept
The agent will create multiple files (index.html, App.tsx, package.json, etc.). Review each in the diff viewer, then click Accept All to apply.
Run & Iterate
Open the terminal (Ctrl+`), run npm install && npm run dev. Use Live Preview to see your app. Ask the AI to refine: "Add dark mode support" or "Make it responsive".
Agent Workflows #
The autonomous agent uses a ReAct (Reason → Act → Observe) loop. Here's how to leverage it effectively.
Prompt Engineering for the Agent
| Pattern | Example | Why It Works |
|---|---|---|
| Be Specific | "Create an Express.js REST API with /users and /posts endpoints, using TypeScript and Zod validation" | Reduces ambiguity, fewer iterations |
| Use Context | "@file src/App.tsx — refactor this component to use React Query instead of useEffect" | Agent sees exact code to modify |
| Chain Tasks | "First create the database schema, then build the API, then write tests" | Agent plans a multi-step sequence |
| Constrain Output | "Fix the bug in @file utils.ts — only modify that file, don't create new files" | Prevents unwanted side effects |
Multi-File Editing
When the agent modifies multiple files, you get a unified diff panel showing all changes. Best practices:
- Review the file tree on the left — click any file to jump to its diff
- Accept/Reject per-file or use bulk actions
- The agent tracks its own progress in a todo list visible in the UI
- If something looks wrong, type "undo that last change" in chat
Choosing the Right Model #
Different tasks call for different models. Here's a decision matrix based on your hardware and use case:
| Scenario | Recommended Model | VRAM Needed | Speed |
|---|---|---|---|
| General coding (auto-complete, chat) | Qwen 2.5 Coder 7B Q4_K_M | 4 GB | Fast |
| Complex reasoning / planning | DeepSeek-R1 7B Q4_K_M | 4 GB | Fast |
| Low-resource machine (laptop) | Phi-3 Mini 3.8B Q4_K_M | 2 GB | Very fast |
| Maximum quality (8+ GB VRAM) | CodeLlama 13B Q4_K_M | 8 GB | Moderate |
| Long-context projects | Llama 3.3 8B Q4_K_M | 6 GB | Fast |
| Embedding / RAG | nomic-embed-text-v1.5 | 1 GB | N/A |
Quantization Guide
GGUF models come in various quantization levels. Here's what they mean:
| Quantization | Quality | Size vs Full | Best For |
|---|---|---|---|
Q8_0 | Near-lossless | ~50% | Maximum quality, plenty of VRAM |
Q6_K | Excellent | ~42% | Best quality/size tradeoff |
Q5_K_M | Very good | ~35% | Good quality, moderate VRAM |
Q4_K_M | Good | ~28% | Recommended default — sweet spot |
Q3_K_M | Acceptable | ~22% | Low VRAM, still usable quality |
Q2_K | Degraded | ~15% | Absolute minimum — last resort |
GPU Optimization Guide #
Get the best inference performance from your NVIDIA GPU.
Prerequisites
- NVIDIA GPU with CUDA Compute Capability 5.0+ (GTX 900 series or newer)
- Latest NVIDIA drivers (Game Ready or Studio)
- MBS Workbench detects CUDA automatically — no toolkit installation needed
VRAM Budget Planning
Your GPU VRAM determines what models you can run and how fast. Here's how VRAM is allocated:
| Component | VRAM Usage | Notes |
|---|---|---|
| Model Weights | Model file size | A 4.4 GB Q4_K_M uses ~4.4 GB VRAM when fully offloaded |
| KV Cache | ~200 MB per 4K context | Scales with context window size |
| OS / Desktop | ~500 MB – 1 GB | Windows reserves VRAM for the display server |
| Buffer | ~200 MB | Working memory for matrix operations |
Performance Tuning Tips
Batch Size
Larger batch sizes increase throughput but use more VRAM. MBS auto-calculates optimal batch size. Manual override in Parameters → Advanced.
Thread Count
CPU threads for non-GPU layers. Default is n_cpu_cores / 2. Increase for pure CPU inference; keep default for GPU-dominant setups.
Custom MCP Servers #
Extend MBS Workbench's capabilities by adding your own MCP servers. MCP servers expose tools to the AI model via JSON-RPC 2.0 over stdio.
Creating a Custom Server
// my-mcp-server.js — Minimal MCP server example
const { Server } = require('@modelcontextprotocol/sdk/server');
const { StdioServerTransport } = require('@modelcontextprotocol/sdk/server/stdio');
const server = new Server({ name: 'my-custom-tools', version: '1.0.0' }, {
capabilities: { tools: { listChanged: false } }
});
server.setRequestHandler('tools/list', async () => ({
tools: [{
name: 'get_weather',
description: 'Get weather for a city',
inputSchema: {
type: 'object',
properties: { city: { type: 'string', description: 'City name' } },
required: ['city']
}
}]
}));
server.setRequestHandler('tools/call', async (request) => {
if (request.params.name === 'get_weather') {
const city = request.params.arguments.city;
return { content: [{ type: 'text', text: `Weather in ${city}: 72°F, sunny` }] };
}
});
const transport = new StdioServerTransport();
server.connect(transport);
Registering Your Server
Open MCP Manager (Ctrl+Shift+M), click Add Server, and configure:
| Field | Value |
|---|---|
| Name | my-custom-tools |
| Command | node |
| Args | ["./my-mcp-server.js"] |
| Transport | stdio |
| Auto-Start | Optional — start on app launch |
Docker Deployment #
Deploy your project using Docker directly from MBS Workbench.
Generate a Dockerfile
Open Deploy → Docker/K8s. Click Generate Dockerfile. Select your framework (Node, Python, Rust, Go, Java, or Static). MBS generates an optimized multi-stage Dockerfile.
Build the Image
Click Build Image or run docker build -t myapp:latest . in the terminal. MBS shows real-time build progress with expandable layer details.
Run the Container
Click Run from the image list or use docker run -p 3000:3000 myapp:latest. Container logs stream in the Docker panel.
Manage & Monitor
Use the Docker dashboard to view running containers, inspect logs, exec into shells, and stop/restart containers — all without leaving the IDE.
Building a RAG Pipeline #
Use Retrieval-Augmented Generation (RAG) to give your AI model access to your project's documentation, codebase, or any document corpus.
Load an Embedding Model
Download an embedding model like nomic-embed-text-v1.5 from HuggingFace. Load it in the Embeddings panel.
Index Your Documents
Open Document Chat from the Activity Bar. Drag and drop files (PDF, TXT, MD, code files) into the panel. MBS chunks, embeds, and stores them in a local vector database.
Query with Context
Ask questions like "What does the authentication system do?" or "Find all API endpoints related to billing." The RAG pipeline retrieves the most relevant chunks and injects them into the LLM's context window.
Use @codebase in Chat
In the main AI chat, type @codebase to automatically search your indexed project. The agent receives semantic search results as context, enabling codebase-aware answers.
API Reference #
Complete reference for MBS Workbench's internal APIs, Tauri commands, agent tools, and configuration schemas.
Tauri Commands (IPC) #
All communication between the React frontend and Rust backend happens via Tauri's IPC bridge. Each command is invoked with invoke('command_name', { args }).
LLM & Inference
| Command | Arguments | Returns | Description |
|---|---|---|---|
load_model | path: string, gpu_layers?: number | ModelInfo | Load a GGUF model into memory with optional GPU layer override |
unload_model | — | void | Release the loaded model from memory |
chat_completions | messages: Message[], params: InferenceParams | Stream<string> | Stream chat completions (SSE) from the loaded model |
cancel_inference | — | void | Abort the current inference stream |
get_model_info | — | ModelInfo | null | Returns metadata about the currently loaded model |
detect_hardware | — | HardwareInfo | Returns GPU, RAM, CPU details for the current system |
File System
| Command | Arguments | Returns | Description |
|---|---|---|---|
read_dir | path: string, recursive?: boolean | FileEntry[] | List directory contents with metadata |
read_file_text | path: string | string | Read file contents as UTF-8 text |
write_file | path: string, content: string | void | Write contents to a file (create or overwrite) |
delete_path | path: string | void | Delete a file or directory |
rename_path | from: string, to: string | void | Rename or move a file/directory |
search_files | query: string, path: string, regex?: boolean | SearchResult[] | Search file contents with optional regex |
Project & Workspace
| Command | Arguments | Returns | Description |
|---|---|---|---|
open_project | path: string | ProjectInfo | Open a folder as workspace, index files |
get_project_info | — | ProjectInfo | Get current workspace path, file count, watchers |
run_terminal_command | command: string, cwd?: string | CommandResult | Execute a shell command and return stdout/stderr |
Agent Tool API #
The autonomous agent has access to 23 tools organized into 7 categories. Each tool follows a strict JSON schema for input/output.
Tool Definitions
| Tool | Category | Safety Tier | Input Schema |
|---|---|---|---|
read_file | Filesystem | Safe (1) | { path: string } |
create_file | Filesystem | Safe (1) | { path: string, content: string } |
edit_file | Filesystem | Elevated (2) | { path: string, old: string, new: string } |
delete_file | Filesystem | Dangerous (3) | { path: string } |
list_directory | Filesystem | Safe (1) | { path: string, recursive?: boolean } |
search_files | Filesystem | Safe (1) | { pattern: string, path?: string } |
batch_read | Filesystem | Safe (1) | { paths: string[] } |
execute_command | Terminal | Elevated (2) | { command: string, cwd?: string } |
run_python | Terminal | Elevated (2) | { code: string } |
run_node | Terminal | Elevated (2) | { code: string } |
web_search | Web | Dangerous (3) | { query: string, count?: number } |
fetch_url | Web | Dangerous (3) | { url: string } |
scrape_webpage | Web | Dangerous (3) | { url: string } |
git_status | Git | Safe (1) | { path?: string } |
git_diff | Git | Safe (1) | { path?: string } |
git_commit | Git | Elevated (2) | { message: string } |
git_push | Git | Dangerous (3) | { remote?: string, branch?: string } |
git_log | Git | Safe (1) | { count?: number } |
analyze_code | Analysis | Elevated (2) | { path: string } |
explain_code | Analysis | Safe (1) | { code: string, language?: string } |
suggest_refactor | Analysis | Safe (1) | { path: string } |
query_sqlite | Database | Elevated (2) | { db_path: string, query: string } |
parse_csv | Database | Safe (1) | { path: string } |
MCP Protocol #
The Model Context Protocol uses JSON-RPC 2.0 over stdio for communication between MBS and MCP servers.
Lifecycle
MBS Workbench MCP Server
│ │
│──── initialize ────────►│ Server starts, reports capabilities
│◄─── initialized ────────│
│ │
│──── tools/list ────────►│ List available tools
│◄─── tools[] ────────────│
│ │
│──── tools/call ────────►│ Execute a tool with arguments
│◄─── result ─────────────│
│ │
│──── shutdown ──────────►│ Graceful shutdown
│◄─── ok ─────────────────│
Server Configuration Schema
{
"name": "server-name", // Unique identifier
"command": "node", // Executable to run
"args": ["./server.js"], // Command arguments
"env": { "API_KEY": "..." }, // Environment variables (optional)
"transport": "stdio", // Transport type (stdio only currently)
"auto_start": false, // Start on app launch
"health_check_interval": 30, // Seconds between health pings
"restart_on_crash": true, // Auto-restart on server crash
"max_restarts": 3 // Maximum restart attempts
}
Pre-Configured Servers (24 total)
| Domain | Servers | Install Command |
|---|---|---|
| Dev Tools | Rust Analyzer, Pyright, TypeScript, Clangd, Go, TexLab | Auto-detected from PATH |
| Blockchain | Solana, Ethereum, CoinMarketCap, DEX Screener | npx @mcp/solana |
| Game Engines | Godot, Unity, Unreal | npx @mcp/godot |
| Web Dev | Puppeteer, Vercel, Docker | npx @mcp/puppeteer |
| Databases | PostgreSQL, SQLite, Redis | npx @mcp/postgres |
| Media | ComfyUI, FFmpeg, ImageMagick | npx @mcp/comfyui |
| Mobile | Flutter, Android | npx @mcp/flutter |
| Utilities | Filesystem, Terminal, Git, HTTP | Built-in (no install) |
Settings Schema #
All app settings are stored locally and accessible via Ctrl+,. Here's the complete schema:
Inference Settings
| Key | Type | Default | Description |
|---|---|---|---|
temperature | number | 0.7 | Sampling temperature (0.0 – 2.0) |
top_p | number | 0.9 | Nucleus sampling threshold |
top_k | number | 40 | Top-K sampling limit |
repeat_penalty | number | 1.1 | Repetition penalty factor |
max_tokens | number | 4096 | Maximum generation length |
context_window | number | auto | Context window size (auto = 75% GPU capacity) |
gpu_layers | number | auto | Number of layers to offload to GPU |
batch_size | number | 512 | Prompt processing batch size |
threads | number | auto | CPU threads for inference |
Appearance Settings
| Key | Type | Default | Description |
|---|---|---|---|
theme | string | "dark" | Theme mode: "dark", "light", "high-contrast" |
font_family | string | "system-ui" | UI font family |
font_size | number | 13 | UI font size in pixels |
editor_font_family | string | "Cascadia Code" | Editor font family |
editor_font_size | number | 14 | Editor font size |
editor_line_height | number | 1.6 | Editor line height multiplier |
ui_scale | number | 1.0 | Global UI scale factor |
minimap | boolean | true | Show editor minimap |
Cloud Provider Settings
| Key | Type | Default | Description |
|---|---|---|---|
openai_api_key | string | "" | OpenAI API key |
anthropic_api_key | string | "" | Anthropic API key |
google_api_key | string | "" | Google Gemini API key |
default_provider | string | "local" | Default inference provider |
Complete Keyboard Shortcuts #
Full list of all keyboard shortcuts in MBS Workbench.
General
Navigation
Activity Bar Panels
Editor
AI & Chat
CLI Reference #
MBS Workbench can be launched from the command line with optional arguments:
# Open MBS Workbench
mbs-workbench
# Open a specific folder
mbs-workbench /path/to/project
# Open a specific file
mbs-workbench /path/to/file.ts
# Start with a specific model loaded
mbs-workbench --model /path/to/model.gguf
# Start in CPU-only mode (skip GPU detection)
mbs-workbench --cpu-only
# Show version
mbs-workbench --version
Design Agent #
The Design Agent is a local AI-powered design assistant that generates complete page layouts, component designs, and brand-consistent UI from natural language prompts — all running on your GPU with no cloud dependency.
Natural Language Design
Describe what you want — "a dark-themed pricing page with 3 tiers" — and the Design Agent generates production-ready HTML/CSS/Tailwind output using your local LLM.
Brand-Aware Generation
The agent reads your brand tokens (colors, fonts, spacing) and weaves them into every design, ensuring brand consistency across all generated components.
Dynamic Token Budgets
Automatically calculates max output tokens from your model's context window (n_ctx()), so large designs don't get truncated.
Compressed Preambles
Brand context is injected as compact token-efficient preambles, leaving maximum context for the actual design generation.
Visual Canvas #
A full drag-and-drop visual editor integrated directly into the IDE. Build pages visually with real-time preview, then export clean HTML/CSS/React code.
Drag-and-Drop Builder
Place components on a 2D canvas with snap-to-grid alignment, resize handles, and layer ordering. No code required for layout design.
Live Code Sync
Every visual change syncs to clean, editable code in real-time. Switch between visual and code views seamlessly.
Responsive Breakpoints
Preview and design for desktop, tablet, and mobile breakpoints. The canvas adapts to show your design at each viewport size.
Component Library
Built-in library of common UI components — navbars, hero sections, cards, footers, forms, modals — ready to drag onto your canvas.
Brand & Tokens #
Define your brand identity once and use it everywhere. Brand tokens (colors, typography, spacing, border-radius) are injected into the Design Agent, Visual Canvas, Template Marketplace, and AI Copywriter for consistent output.
Token Editor
Visual editor for brand tokens — pick colors, set font stacks, define spacing scales, configure border-radius and shadow presets.
Export Formats
Export tokens as CSS custom properties, Tailwind config, SCSS variables, or JSON for use in any project.
Template Marketplace #
Browse and install 15 unique, production-ready templates spanning landing pages, dashboards, portfolios, e-commerce, blogs, and more. Each template ships in 8 color variants (120 total combinations) and adapts to your brand tokens automatically.
15 Unique Templates
SaaS Landing, Portfolio, Blog, E-Commerce, Dashboard, Documentation, Agency, Restaurant, Fitness, Real Estate, Event, Education, Medical, Travel, and Startup — each professionally designed.
8 Color Variants
Every template includes 8 curated color schemes: Default, Ocean, Sunset, Forest, Royal, Coral, Midnight, and Amber. One click to switch.
Brand Token Integration
Templates automatically adopt your brand tokens — colors, fonts, and spacing — so the output matches your brand from the start.
One-Click Install
Preview any template in the built-in live preview, then install directly into your project with clean, editable HTML/CSS/React code.
AI Copywriter #
Generate marketing copy, product descriptions, CTAs, headlines, and page content using your local LLM. The AI Copywriter respects your brand tone and outputs clean, usable text with a 512-token generation limit for concise, focused copy.
Copy Templates
Pre-built prompts for hero headlines, feature descriptions, testimonials, pricing blurbs, email subject lines, and social media posts.
Tone Control
Select tone presets — Professional, Casual, Technical, Playful, Luxury — or define a custom brand voice for consistent messaging.
SEO & Performance Analyzer #
Built-in SEO and performance analysis for your pages and projects. Checks meta tags, heading structure, image alt text, page speed metrics, and accessibility compliance — all locally, no third-party services.
Asset Manager #
Organize, optimize, and manage images, icons, fonts, and other assets directly in the IDE. Drag-and-drop upload, automatic image compression, SVG optimization, and sprite generation.
Voice Studio (TTS) #
A full text-to-speech studio built into the IDE. Synthesize speech locally using Kokoro-82M ONNX (22 neural voices, no internet required) or Windows SAPI as a zero-install fallback. Audio is decoded directly in the browser using a Blob URL — no external protocol configuration needed.
Kokoro ONNX Engine
22 neural voice profiles running on-device via the Kokoro-82M model. Download the model once (~82 MB) and get high-quality, expressive speech synthesis with zero latency after the first warmup. Speed and pitch controls included.
Windows SAPI Fallback
If the Kokoro model is not installed, Workbench automatically falls back to Windows SAPI via PowerShell — no setup required. Any Windows voice installed on the system is available instantly.
In-IDE Audio Playback
Generated WAV audio is read as binary, decoded to a Blob URL, and played back in-app without any server or file protocol workarounds. Preview, replay, and download audio files directly from Voice Studio.
ONNX Status Dashboard
The Voice Studio panel shows the Kokoro model install status, runtime availability, model size, recommended action, and direct download link — so you always know what's available on your machine.
Mobile Export #
Export your web projects to native mobile apps with one click. MBS supports Capacitor (iOS/Android native), PWA (Progressive Web App), and APK (Android Package) export targets. Projects export to Documents/MBS-Mobile-Exports.
Capacitor Export
Generate a full Capacitor project with native iOS and Android wrappers. Ready for Xcode or Android Studio.
PWA Export
Generate a Progressive Web App with service worker, manifest, offline support, and installability — works on any device.
APK Build
Build a standalone Android APK directly from the IDE. No need to install Android Studio separately.
Interactive SDK #
Build interactive forms, surveys, quizzes, and multi-step workflows with a visual form builder. The SDK supports conditional logic, field validation, file uploads, and webhook integrations — all rendered client-side with zero backend required.
Visual Form Builder
Drag-and-drop form fields — text inputs, selects, checkboxes, file uploads, date pickers, sliders — with real-time preview.
Conditional Logic
Show/hide fields, skip steps, or change validation based on user responses. Build complex multi-step flows without code.
Embed & Export
Export forms as standalone HTML, React components, or embed via iframe. Each form gets a unique formId for tracking.
Architecture #
MBS Workbench is built on Tauri + Rust + React — a modern architecture that delivers near-native performance in a compact, self-contained native app.
Technology Stack
| Component | Technology | Why |
|---|---|---|
| Desktop Framework | Tauri 1.5 (Rust) | 600 KB runtime vs Electron's 100 MB. Native Windows/Mac/Linux. |
| LLM Inference | Native Rust engine with CUDA bindings | Fastest GGUF inference. Direct CUDA. In-process — no server. |
| Image Generation | Native diffusion engine (C++/CUDA) | GPU-accelerated diffusion. SD 1.5, SDXL, FLUX. In-process — no Python. |
| Model Training | Custom LoRA/QLoRA pipeline | Hardware-aware training with advanced optimization. Consumer GPU support. |
| Frontend | React 18 + TypeScript | Industry standard. Monaco Editor compatibility. Component ecosystem. |
| Styling | TailwindCSS + CSS Custom Props | Utility-first with design token system. 100+ variables. |
| State Management | Zustand | Lightweight, TypeScript-native, minimal boilerplate. |
| Local Storage | rusqlite (SQLite) | Embedded database for settings, history, model metadata. |
| Terminal | xterm.js + PTY | Full terminal emulation with pseudo-terminal backend. |
| Preview Server | Warp (Rust) | Lightweight HTTP server for real-time web preview. |
Performance Tuning #
Inference Optimizer (Zero Configuration)
MBS Workbench includes a built-in inference optimizer that automatically configures every loaded model for maximum speed on your hardware. No manual tuning required.
Auto-Quantization
Detects available VRAM and selects the optimal quantization level (Q4_K_M for <6 GB, Q5_K_M for 6-10 GB, Q8_0 for 10-16 GB, FP16 for 16+ GB). Cached conversions skip re-quantization on subsequent loads.
Intelligent Layer Partitioning
Automatically calculates how many layers fit in GPU VRAM and offloads the rest to CPU with pinned memory. Enables 30B+ models on 4 GB GPUs with mixed GPU/CPU inference.
Flash Attention & KV Cache
Flash Attention v3 enabled by default on supported GPUs (2-3× faster). 8-bit KV cache reduces attention memory by 75%, enabling 256K+ context windows on consumer hardware.
Model-Specific Presets
Hand-tuned optimization presets for popular model architectures. Auto-detected on load. MoE expert parallelism, vision-language encoder splitting, and diffusion-specific memory management.
Hardware-Aware Scaling
At launch, MBS detects your exact hardware configuration and scales every subsystem:
| System Tier | RAM | GPU | Optimized For |
|---|---|---|---|
| Low | 8 GB | 0-2 GB VRAM | 2-4B models, CPU inference, minimal context |
| Medium | 16 GB | 4 GB VRAM | 7B models, partial GPU offload, 4K context |
| High | 32 GB | 8 GB VRAM | 13B models, full GPU offload, 8K context |
| Ultra | 64 GB+ | 12 GB+ VRAM | 30B+ models, maximum context, batch decode |
Memory Architecture
Three-tier memory system ensures optimal performance:
L1 — Hot (2 GB)
Active conversation context, current file buffer, model attention cache
L2 — Warm (8 GB)
Conversation history, recent files (last 10), KV cache, project embeddings
L3 — Cold (SSD)
Model weights (memory-mapped), project history, vector database, preferences
Privacy & Security #
- No account required — download, install, use. Zero signup friction.
- No telemetry — we don't collect usage data, crash reports, or metrics.
- Offline-capable — once a model is downloaded, everything works without internet.
- Agent safety tiers — three-level permission system prevents unintended side effects.
- Local-only storage — settings, history, and model metadata stored in local SQLite.
- Open model ecosystem — use any GGUF model from any source. No vendor lock-in.
How MBS Compares #
| Feature | MBS Workbench | GitHub Copilot | Cursor | LM Studio |
|---|---|---|---|---|
| Pricing | Free / One-time | $10-39/mo | $20/mo | Free |
| Runs Locally | ✓ Full | ✗ Cloud | ✗ Cloud | ✓ Inference |
| 100% Private | ✓ | ✗ | ✗ | ✓ |
| Code Editor | ✓ Monaco | ✓ VS Code ext | ✓ Fork | ✗ None |
| Autonomous Agent | ✓ 23 tools | ✗ | Partial | ✗ |
| GPU Acceleration | ✓ Native CUDA | N/A | N/A | ✓ |
| Model Choice | ✓ Any GGUF | GPT-4 only | GPT-4/Claude | ✓ Any GGUF |
| Source Control (Git) | ✓ Native git2-rs | ✓ Built-in | ✓ Built-in | ✗ |
| Debug (DAP) | ✓ 9 adapters | ✓ Built-in | ✓ Built-in | ✗ |
| Multi-Model Chat | ✓ Tournament | ✗ | ✗ | ✗ |
| Voice Input (STT) | ✓ Web Speech API | ✗ | ✗ | ✗ |
| Text-to-Speech (TTS) | ✓ Kokoro + SAPI | ✗ | ✗ | ✗ |
| Episodic Memory | ✓ SQLite + vectors | ✗ | ✗ | ✗ |
| Conversation Branching | ✓ Git-tree UI | ✗ | ✗ | ✗ |
| Telegram Bot Mode | ✓ Local polling | ✗ | ✗ | ✗ |
| Image Generation | ✓ SD local | ✗ | ✗ | ✗ |
| Model Training | ✓ LoRA/QLoRA | ✗ | ✗ | ✗ |
| HuggingFace Browser | ✓ Built-in | ✗ | ✗ | Partial |
| MCP Protocol | ✓ 24 servers | ✗ | Partial | ✗ |
| Docker / K8s | ✓ Full mgmt | ✗ | ✗ | ✗ |
| Multi-Cloud Deploy | ✓ 5 providers | ✗ | ✗ | ✗ |
| Cost Analytics | ✓ | ✗ | ✗ | ✗ |
| Offline Mode | ✓ Full | ✗ | ✗ | ✓ |
| App Size | ~175 MB | N/A | ~400 MB | ~200 MB |
Frequently Asked Questions #
Is MBS Workbench free?
The core application is free. Future Pro features (advanced agent workflows, enterprise MCP servers, priority model access) may be offered as a one-time license — never a subscription.
What models work with MBS?
Any GGUF-format language model — Qwen, DeepSeek, Llama, Mistral, Phi, Gemma, CodeLlama, and thousands more from HuggingFace. For image generation, MBS supports Stable Diffusion 1.5, SDXL, and FLUX models. Plus, 10 cloud API providers (OpenAI, Anthropic, Google, etc.) are built in for hybrid workflows.
Do I need a GPU?
No. MBS works in CPU-only mode — just slower. For the best experience, an NVIDIA GPU with 4+ GB VRAM is recommended. Even a GTX 1660 (6 GB) provides a great experience with 7B models.
Does it send my code to the cloud?
Never. When using local models, your code and prompts never leave your machine. Cloud providers (OpenAI, Anthropic, etc.) are opt-in and clearly labeled in the UI.
How does it compare to Copilot/Cursor?
Copilot and Cursor are cloud-first tools with recurring subscriptions. MBS runs locally, costs nothing per month, and gives you complete model freedom. See the full comparison table.
Can I use it offline?
Yes. Once you've downloaded a model, everything works without internet — editing, AI completions, chat, agent, image generation, model training, terminal, and all development tools.
What about Mac and Linux?
Windows is the primary platform today. Mac and Linux builds are planned — Tauri natively supports all three platforms.