Documentation

Build with AI.
Locally. Privately. Yours.

MBS Workbench is a GPU-accelerated AI development environment that runs entirely on your machine. Local inference, image generation, model training, 10 cloud API providers, and a full code editor — no subscriptions, no cloud, no data leaves your device.

158+
Backend Modules
191
UI Components
24
MCP Servers
250%+
AI Capability
100%
Local & Private
Why MBS Workbench?
AI-assisted coding is a $7.5B market growing 35% year-over-year. Every leading tool requires cloud connectivity and recurring payments. MBS Workbench is the first product to combine a full code editor, local LLM inference, image generation, on-device model training, a 23-tool autonomous agent, 10 cloud API providers, an 800K+ model marketplace, a full Design & Build suite with visual canvas, and mobile export — in a single native binary. 70 major feature releases. 158+ backend modules. 191 frontend components. 250%+ AI capability beyond any cloud tool. 224,972 lines of code. 1,926 Tauri commands. Your GPU. Your data. Zero marginal cost.

Download & Install #

System Requirements

ComponentMinimumRecommended
OSWindows 10 (64-bit)Windows 11
RAM8 GB16 GB+
GPUCPU-only mode availableNVIDIA GPU with 4 GB+ VRAM
Storage500 MB (app)10 GB+ (app + models)
CUDACUDA 12.x for GPU acceleration

Installation Methods

NSIS Installer

One-click Windows installer with start menu shortcuts and uninstaller. Recommended for most users.

MBS Workbench_x64-setup.exe

MSI Installer

Enterprise-grade MSI package for Group Policy and SCCM deployment across organizations.

MBS Workbench_x64_en-US.msi

Portable EXE

Standalone executable — no installation required. Run from USB drives or restricted environments.

MBS Workbench.exe (134 MB)
GPU Acceleration
MBS Workbench includes built-in CUDA support. If you have an NVIDIA GPU, GPU acceleration activates automatically — no driver installation or configuration needed beyond standard NVIDIA drivers.

Quick Start #

Get productive in under 5 minutes. Here's everything you need to go from install to your first AI-assisted coding session.

Download & Launch

Download the installer from the releases page and run it. MBS Workbench launches instantly — no account creation, no sign-in, no telemetry.

Download an AI Model

Open the AI & Models panel from the Activity Bar (or press Ctrl+Shift+A). Click HuggingFace to browse 800,000+ models. We recommend starting with Qwen 2.5 Coder 7B Q4_K_M — it fits in 4 GB VRAM and delivers excellent code generation.

Load the Model

Go to Load Model and select your downloaded GGUF file. MBS automatically detects your GPU and optimizes layer offloading, context window, and batch size.

Open a Project

Press Ctrl+Shift+O or click File → Open Folder to open your workspace. The file explorer appears in the sidebar — click any file to start editing.

Start Coding with AI

The AI chat panel appears on the right. Type a prompt like "Create a REST API with Express.js and TypeScript" — the agent will create files, install dependencies, and set everything up. You can also use @file, @folder, or @codebase to inject context.

Code Editor #

MBS Workbench includes a full Monaco-powered code editor — the same engine that powers VS Code. You get an enterprise-grade editing experience without any external dependencies.

50+ Languages

Syntax highlighting for TypeScript, Python, Rust, Go, Java, C++, Solidity, and dozens more.

Multi-Tab Editor

Open multiple files in tabs. Drag to reorder. Modified files show a dot indicator.

Minimap & Breadcrumbs

Minimap overview of your file. Breadcrumb navigation shows your location in the code hierarchy.

Find & Replace

Powerful in-file search with regex support. Global workspace search with file filtering.

Split Editor

Split your editor into multiple panes. Compare files side by side or reference code while editing.

Code Folding

Collapse code blocks, bracket pair colorization, indentation guides, and sticky scroll headers.

AI Chat & Autonomous Agent #

MBS Workbench includes a 23-tool autonomous agent powered by a structured ReAct (Reason → Act → Observe) state machine — not brittle prompt chains. Give it a task in natural language and it will plan, execute, and iterate until it's done.

Agent Capabilities

CategoryToolsWhat It Can Do
Filesystem7Create, read, edit, delete, search files, batch-read directories
Terminal3Run shell commands, execute Python & Node.js scripts
Web3Web search, fetch URLs, scrape webpages
Git5Status, commit, push, pull, branch management
Analysis3Static analysis, code explanation, refactoring suggestions
Database2SQLite queries, CSV parsing

Safety System

A three-tier permission system protects you from unintended side effects:

Safe (Tier 1)

Read-only operations and file creation. read_file, create_file, list_directory — always allowed.

Elevated (Tier 2)

Execution and analysis tools. execute_command, git_commit, analyze_code — allowed after model qualification.

Dangerous (Tier 3)

Network and destructive operations. web_search, git_push, delete_file — restricted to verified models.

@-Context Injection

Inject precise context into any AI conversation by typing @ followed by a context source:

ContextDescription
@fileInject the contents of a specific file into the prompt
@folderInject directory tree and file summaries
@codebaseInject indexed codebase context (symbols, definitions)
@selectionInject current editor selection
@terminalInject last terminal output

Multi-File AI Edits

The AI agent can propose changes across multiple files simultaneously. You get a unified diff viewer with:

Model Management #

Loading Models

MBS Workbench supports any GGUF-format model — the industry standard for local LLM inference. Models are loaded directly into process memory with native CUDA GPU offloading.

Hardware-Aware Auto-Configuration
When you load a model, MBS automatically detects your GPU VRAM, system RAM, and CPU cores. It computes optimal GPU layer offloading, context window size, and batch dimensions — no manual tuning required.

Recommended Models

ModelSizeBest ForMin. VRAM
Qwen 2.5 Coder 7B Q4_K_M4.4 GBCode generation, best quality/size ratio4 GB
DeepSeek-R1 7B Q4_K_M4.5 GBReasoning & chain-of-thought4 GB
Llama 3.3 8B Q4_K_M5.0 GBGeneral purpose, chat6 GB
Phi-3 Mini Q4_K_M2.0 GBSmall, fast, low-resource machines2 GB
Mistral 7B Q4_K_M4.1 GBVersatile, strong instruction following4 GB
CodeLlama 13B Q3_K_M5.5 GBComplex code tasks (8 GB VRAM)8 GB

GPU Acceleration (CUDA)

MBS Workbench embeds llama.cpp with native CUDA bindings directly in the Tauri binary. Unlike Ollama or LM Studio, there's no separate inference server — the model runs in the same process as the editor.

Partial GPU Offloading

If your GPU has limited VRAM, MBS automatically splits model layers between GPU and CPU. A 7B model on a 4 GB GPU offloads ~28 of 32 layers to GPU, keeping 4 on CPU.

Full GPU Offloading

With sufficient VRAM, all layers run on GPU for maximum speed. A 7B Q4 model fully offloaded to an RTX 4060 delivers 40+ tokens/sec.

Model Parameters

Fine-tune inference behavior from the Parameters panel:

ParameterRangeDescription
Temperature0.0 – 2.0Controls randomness. Lower = more deterministic, higher = more creative.
Top-P0.0 – 1.0Nucleus sampling — limits token pool to cumulative probability threshold.
Top-K1 – 200Limits sampling to the top K most likely tokens.
Repeat Penalty1.0 – 2.0Penalizes repeated tokens to reduce repetitive output.
Max Tokens64 – 32768Maximum number of tokens to generate per response.
Context Window512 – 131072Total tokens the model can see (prompt + response). Auto-sized to 75% GPU capacity.

Cloud Providers (10 APIs)

While MBS Workbench is built for local inference, you can connect to 10 cloud LLM providers for hybrid workflows. Mix local and cloud models in the same session with real-time cost tracking.

Local Models (GGUF) OpenAI (GPT-4o) Anthropic (Claude 3.5) Google (Gemini Pro) AWS Bedrock DeepSeek Groq Mistral Cohere Together AI OpenRouter

Configure API keys in Settings → Cloud Providers. The unified provider selector in chat lets you switch between local and cloud models per-message. An OpenAI-compatible fallback supports any additional provider.

Speculative Decoding

Accelerate inference by pairing a large model with a smaller draft model. The draft model generates candidate tokens that the main model verifies in parallel — delivering 2-3x speedup on compatible hardware.

Embeddings & Vision

Load embedding models for local RAG (Retrieval-Augmented Generation) and semantic search. Vision model support (LLaVA, Qwen-VL) lets you paste images directly in chat for screenshot-to-code workflows and image analysis.

HuggingFace Explorer #

Browse and download from HuggingFace's 800,000+ model library without leaving the app. MBS Workbench scores every model against your hardware for instant compatibility assessment.

12 Task Categories

Chat, Code, Image Gen, WebDev, Game Dev, Agent, Reasoning, Vision, Embedding, Translation, Summarization, Math

Smart Scoring

Models ranked by downloads × likes × quantization quality × hardware fit. Badges: Top 5, Popular, Trusted.

One-Click Download

Streaming download with real-time bytes/sec, ETA, and progress. Auto-detect GGUF variants and quantization levels.

Hardware Matching

Each model scored: Perfect, Good, Possible, or Too Large for your system.

Inline AI Completions #

Get real-time AI-powered code suggestions as you type — similar to GitHub Copilot, but running entirely on your local GPU with zero latency and complete privacy.

Privacy Guarantee
Unlike GitHub Copilot, every keystroke stays on your machine. Your code is never transmitted to any external server. This makes MBS Workbench the only viable option for defense, healthcare, finance, and government teams with strict data compliance requirements (HIPAA, ITAR, SOX, GDPR).

Integrated Terminal #

A full terminal emulator built into the bottom panel — no need to switch between windows.

MCP Servers #

MBS Workbench ships with full Model Context Protocol (MCP) infrastructure and 24 pre-configured server definitions spanning 8 domains. MCP enables your AI model to interact with external tools, APIs, and services through a standardized JSON-RPC 2.0 protocol.

DomainServersCapabilities
Dev ToolsRust Analyzer, Pyright, TypeScript, Clangd, Go, TexLabLanguage intelligence, diagnostics, completions
BlockchainSolana, Ethereum, CoinMarketCap, DEX ScreenerSmart contract interaction, price data, DEX analytics
Game EnginesGodot, Unity, UnrealScene management, asset creation, scripting
Web DevPuppeteer, Vercel, DockerBrowser automation, deployment, containerization
DatabasesPostgreSQL, SQLite, RedisQuery execution, schema inspection, caching
MediaComfyUI, FFmpeg, ImageMagickImage generation, video processing, media conversion
MobileFlutter, AndroidApp building, emulator control, hot reload
UtilitiesFilesystem, Terminal, Git, HTTPFile ops, command execution, version control, API calls
How MCP Works
Each server spawns on-demand via JSON-RPC 2.0 over stdio with health monitoring and auto-restart. The AI model can interact with Solana RPCs, query PostgreSQL databases, automate a browser, or render 3D scenes — all from natural language prompts.

Language Server Protocol (LSP) #

Built-in LSP client for professional-grade language intelligence:

TypeScript Python Rust Go JSON HTML CSS

Extensions #

MBS Workbench ships with a built-in extension system:

Git

Python

Markdown

AI Completions

Live Preview

Mermaid

RAG

Image Gen

Search & Replace #

Powerful workspace-wide search with regex, case sensitivity, include/exclude glob filters, and bulk replace. Results are grouped by file with match highlighting. Access via Ctrl+Shift+F or the Search panel in the Activity Bar.

Live Preview #

Preview HTML, CSS, and JavaScript projects in a split pane without leaving the editor. The embedded Warp HTTP server provides instant hot-reload as you type. Toggle with View → Toggle Preview.

Debug & Test #

Full DAP (Debug Adapter Protocol) integration with 9 debug adapters — Node.js, Python, Rust, Go, C/C++, Java, .NET, PHP, and Ruby. Set breakpoints, step through code, inspect variables, evaluate expressions, and view call stacks — all inside MBS Workbench.

Breakpoints & Stepping

Line breakpoints, conditional breakpoints, step over/into/out, continue, restart, and stop. Inline variable values shown directly in the editor.

Compound Configs

Launch multiple debug targets simultaneously. Source map VLQ decode for debugging transpiled code. Launch.json editor with template generation.

Integrated test runner with support for npm test, cargo test, pytest, and jest. Task auto-detection discovers package.json scripts, Cargo targets, and Makefiles automatically. Test results display inline with pass/fail status.

Source Control (Git) #

Complete Git integration powered by git2-rs (native Rust bindings) and porcelain v2 commands. Full SCM sidebar with visual staging, inline blame, and merge conflict resolution.

SCM Sidebar

Stage/unstage files and individual hunks. Commit with message. Visual diff editor. Branch management (create, switch, delete, rename). Tag and stash support.

Advanced Git

Inline blame annotations. Git gutter decorations. Cherry-pick, rebase, and reflog. Merge conflict resolution with accept current/incoming/both. Partial (line-level) staging.

Push, pull, and fetch operations run asynchronously with progress notifications. SCM toolbar provides one-click access to commit, stage all, unstage all, pull, push, and branch switching.

AI Refactoring #

AI-powered code refactoring with real safety checks — the refactoring engine actually runs your test suite (npm test, cargo test, pytest) before and after each transformation to verify correctness.

Extract & Move

Extract method, extract component (React/Vue), move symbol to another file. AI suggests optimal extraction boundaries based on code analysis.

AI Rename & Patterns

Context-aware renaming across files. Regex-based pattern transforms. Full refactoring undo with one-click revert. Safety score shown before applying.

Multi-Model Chat #

Chat with multiple AI models simultaneously in a side-by-side view. Compare responses, run model tournaments, and track cost per query across providers.

Voice Input (STT) #

Live microphone speech-to-text powered by the browser's built-in Web Speech Recognition API — no external binary, no cloud. Continuous recording mode streams interim and final transcripts in real time. Use it to dictate code, chat messages, or search queries entirely on-device.

Image Generation #

Generate images from text prompts using Stable Diffusion running entirely on your GPU. Supports SD 1.5, SDXL, and FLUX architectures with automatic model detection and configuration. No cloud fees, no API limits — unlimited generation on your own hardware.

Text-to-Image

Enter a prompt, generate an image. Token weighting syntax for precise control. 20+ built-in prompt templates. Negative prompt presets. Multiple samplers (Euler, DPM++, LCM) with auto-selection.

img2img & Inpainting

Transform existing images with guided generation. Mask-based selective regeneration. ControlNet integration with 8 modes (Canny, Depth, Pose, Scribble, and more). Adjustable denoising strength.

Live Preview & History

Watch images evolve in real-time as denoising progresses. Full generation history with metadata. Re-generate any previous image with one click. Search by prompt, model, or date.

LoRA & Batch Generation

Stack multiple LoRA weights simultaneously with per-LoRA strength sliders. Smart batch mode with variation strategies (seed walk, CFG sweep, sampler comparison). Export batches as ZIP.

Dynamic VRAM Management
MBS Workbench automatically manages memory between text and image models. When you start generating images, it can prompt you to unload the text model to free VRAM. Auto-tiling and TAESD fallback ensure SDXL generation works even on 4 GB GPUs.

Model Training #

Fine-tune LoRA and QLoRA adapters on your own codebase or domain-specific data. Train specialized models that outperform generic models on your specific tasks — from 4 GB laptops to multi-GPU workstations.

Visual Training Dashboard

Real-time loss curves with moving averages. GPU/CPU memory gauges. Sample generation during training. Estimated time remaining. Pause/resume with one click. Auto-save checkpoints.

Hardware-Aware Presets

Smart hardware detection recommends optimal batch sizes, LoRA rank, and gradient accumulation. Four tiers: Quick (15 min), Balanced (1-2 hr), Quality (3-6 hr), and Professional (multi-GPU).

Advanced Training

Distributed training with auto-generated configs. CPU offloading for 30B+ models on consumer GPUs. Context extension (32K→256K+). Reinforcement learning support.

Export & Deploy

Export trained adapters as GGUF for immediate inference. Multiple quantization options. One-click upload to HuggingFace Hub. Agent trajectory dataset builder for creating tool-using coding agents.

Training for Everyone
MBS Workbench is the only IDE that lets you fine-tune AI models and use them for code completion in the same application. Train a 512 MB specialized adapter, load it immediately, and see the results in your editor. No Python scripts. No command line. No cloud dependencies.

Ollama-Compatible API Server #

MBS Workbench includes a built-in HTTP server with OpenAI-compatible and Ollama-compatible API endpoints. Point any tool that supports the OpenAI API format at your local MBS instance and get responses from your locally loaded model.

Remote Development #

Develop on remote machines via real SSH connections (OpenSSH), run code inside Dev Containers (real docker build and docker run), and sync files via SCP.

SSH Remote

Save and manage SSH configurations. Connect to remote hosts. Browse and edit remote files. Open remote terminals. Run models on remote GPUs.

Dev Containers

Build and run Docker dev containers directly from the UI. Port forwarding. File sync between local and container. Remote LSP support via SSH command spawn.

Docker & Kubernetes #

Full container management without leaving your IDE. MBS Workbench integrates directly with Docker Engine and Kubernetes clusters.

Docker Dashboard

Container Management

List, start, stop, restart, remove containers. View logs and open exec shells. Docker Compose up/down/status.

Image & Build

Pull, build, remove images. Volume and network management. System prune and disk usage monitoring.

Kubernetes Explorer

Visual cluster browser with context/namespace management, pod inspection (logs, describe, YAML), service & deployment management, port-forwarding, and resource YAML apply/delete.

Compose → K8s Converter

Convert Docker Compose files into Kubernetes manifests with one click. Generates Deployments, Services, Ingress, PVCs, HPAs, and health checks. Includes a Dockerfile generator for 6 frameworks (Node, Python, Rust, Go, Java, Static).

Local Cluster Provisioning

Create and manage local Kubernetes clusters through Minikube, Kind, or k3d — directly from the UI.

Cloud GPU Hub #

Need more GPU power than your local hardware provides? The Cloud GPU Hub lets you rent cloud GPUs from leading providers — directly from within MBS Workbench.

Browse & Rent

Compare GPU instances across cloud providers with real-time pricing, availability, and specs. Launch instances with one click and connect them to your workspace.

Budget Controls

Set monthly spending limits, configure cost alerts, and enable auto-terminate to prevent cost overruns. Track spending across all providers in a unified dashboard.

Instance Management

Monitor running instances, view resource utilization, and manage lifecycle (start, stop, terminate) from the Activity Bar. No separate provider dashboards needed.

Integrated Workflows

Use rented GPUs for model training, inference, and image generation — the same features you use locally, running on cloud hardware when you need extra power.

BYOK (Bring Your Own Keys)
Already have accounts with cloud GPU providers? Connect your existing accounts at no extra cost. MBS Workbench acts as a unified dashboard — we never mark up provider pricing for BYOK users.

Cloud Deployment #

Deploy to any major cloud provider with a unified interface:

ProviderServices
AzureContainer Apps, Azure Functions
Google CloudCloud Run, Cloud Functions
AWSLambda (create + update)
VercelFrontend & serverless deploy
NetlifyStatic site & serverless deploy

Each provider includes CLI detection, login management, deployment history, logs, and config template generation.

Model Export & Conversion #

Convert models to production formats for edge deployment:

ONNX CoreML TensorRT OpenVINO GGUF TFLite

Includes device-specific optimization, size estimation, cloud storage upload (S3/GCS/Azure), and inference code generation for Python, Swift, and Kotlin.

Cost Analytics #

Track and optimize your AI spending across all providers. Five-tab dashboard covering overview, cost breakdown by provider/model, pricing reference, budget alerts with thresholds, and local-vs-cloud ROI analysis. See exactly how much you save by running locally.

Keyboard Shortcuts #

MBS Workbench uses familiar VS Code shortcuts. Here are the most important ones:

General

Command PaletteCtrl+Shift+P
Toggle SidebarCtrl+B
Toggle TerminalCtrl+`
Zen ModeCtrl+K Z
New FileCtrl+N
Open FolderCtrl+Shift+O
Save FileCtrl+S
Close TabCtrl+W

Activity Bar Panels

ExplorerCtrl+Shift+E
SearchCtrl+Shift+F
AI & ModelsCtrl+Shift+A
ConnectionsCtrl+Shift+N
Debug & TestCtrl+Shift+D
DeployCtrl+Shift+Y
ExtensionsCtrl+Shift+X
SettingsCtrl+,

Editor

Find in FileCtrl+F
ReplaceCtrl+H
Go to LineCtrl+G
Format DocumentShift+Alt+F
Next TabCtrl+Tab
Previous TabCtrl+Shift+Tab

Themes & Appearance #

MBS Workbench ships with three theme modes and a built-in theme editor:

Dark (Default)

Deep indigo-blue palette with glassmorphism effects. Easy on the eyes for long sessions.

Light

Clean white background with soft shadows. High readability in bright environments.

High Contrast

Maximum contrast for accessibility. Meets WCAG AAA standards.

The Theme Editor lets you customize colors, radii, shadows, and spacing through a visual JSON-based editor. Over 100 CSS custom properties (design tokens) control every visual element.

Command Palette #

Press Ctrl+Shift+P to open the Command Palette — a fuzzy-search overlay that indexes all 52+ commands across 7 categories (AI & Models, Connections, Debug & Test, Deploy & Cloud, Extensions, Navigation, Editor). Type to filter, arrow keys to navigate, Enter to execute.

Settings #

All configuration is managed through the Settings panel. Inference parameters, theme preferences, keyboard shortcuts, cloud API keys, and extension state are all accessible from one place. Settings persist across sessions via local storage.

Guides #

Step-by-step tutorials to get the most out of MBS Workbench. Each guide walks through a real-world workflow from start to finish.

Your First AI Project #

Build a complete web app from scratch using the AI agent — no prior experience needed.

Create a New Workspace

Press Ctrl+Shift+O and select an empty folder (e.g., my-first-app). This becomes your project root.

Load a Model

Open the AI & Models panel (Ctrl+Shift+A). If you haven't downloaded a model yet, click HuggingFace, search for Qwen 2.5 Coder 7B Q4_K_M, and download it. Then switch to Load Model and select the GGUF file.

Describe Your App

In the AI chat panel, type a prompt like:
"Create a React + TypeScript todo app with Tailwind CSS. Include add, delete, toggle complete, and filter functionality. Use localStorage for persistence."

Review & Accept

The agent will create multiple files (index.html, App.tsx, package.json, etc.). Review each in the diff viewer, then click Accept All to apply.

Run & Iterate

Open the terminal (Ctrl+`), run npm install && npm run dev. Use Live Preview to see your app. Ask the AI to refine: "Add dark mode support" or "Make it responsive".

Agent Workflows #

The autonomous agent uses a ReAct (Reason → Act → Observe) loop. Here's how to leverage it effectively.

Prompt Engineering for the Agent

PatternExampleWhy It Works
Be Specific"Create an Express.js REST API with /users and /posts endpoints, using TypeScript and Zod validation"Reduces ambiguity, fewer iterations
Use Context"@file src/App.tsx — refactor this component to use React Query instead of useEffect"Agent sees exact code to modify
Chain Tasks"First create the database schema, then build the API, then write tests"Agent plans a multi-step sequence
Constrain Output"Fix the bug in @file utils.ts — only modify that file, don't create new files"Prevents unwanted side effects

Multi-File Editing

When the agent modifies multiple files, you get a unified diff panel showing all changes. Best practices:

Pro Tip: Agent Loops
For complex tasks, ask the agent to create a plan first: "Plan out how you'd build a full-stack authentication system — don't write code yet." Review the plan, then say "Execute the plan." This gives you control over the approach before any code is written.

Choosing the Right Model #

Different tasks call for different models. Here's a decision matrix based on your hardware and use case:

ScenarioRecommended ModelVRAM NeededSpeed
General coding (auto-complete, chat)Qwen 2.5 Coder 7B Q4_K_M4 GBFast
Complex reasoning / planningDeepSeek-R1 7B Q4_K_M4 GBFast
Low-resource machine (laptop)Phi-3 Mini 3.8B Q4_K_M2 GBVery fast
Maximum quality (8+ GB VRAM)CodeLlama 13B Q4_K_M8 GBModerate
Long-context projectsLlama 3.3 8B Q4_K_M6 GBFast
Embedding / RAGnomic-embed-text-v1.51 GBN/A

Quantization Guide

GGUF models come in various quantization levels. Here's what they mean:

QuantizationQualitySize vs FullBest For
Q8_0Near-lossless~50%Maximum quality, plenty of VRAM
Q6_KExcellent~42%Best quality/size tradeoff
Q5_K_MVery good~35%Good quality, moderate VRAM
Q4_K_MGood~28%Recommended default — sweet spot
Q3_K_MAcceptable~22%Low VRAM, still usable quality
Q2_KDegraded~15%Absolute minimum — last resort

GPU Optimization Guide #

Get the best inference performance from your NVIDIA GPU.

Prerequisites

VRAM Budget Planning

Your GPU VRAM determines what models you can run and how fast. Here's how VRAM is allocated:

ComponentVRAM UsageNotes
Model WeightsModel file sizeA 4.4 GB Q4_K_M uses ~4.4 GB VRAM when fully offloaded
KV Cache~200 MB per 4K contextScales with context window size
OS / Desktop~500 MB – 1 GBWindows reserves VRAM for the display server
Buffer~200 MBWorking memory for matrix operations
Rule of Thumb
Available VRAM = Total VRAM − 1 GB (OS overhead). For an RTX 4060 (8 GB), you have ~7 GB useable. A 7B Q4_K_M model (~4.4 GB) leaves ~2.5 GB for context, which gives you ~8K tokens comfortably.

Performance Tuning Tips

Batch Size

Larger batch sizes increase throughput but use more VRAM. MBS auto-calculates optimal batch size. Manual override in Parameters → Advanced.

Thread Count

CPU threads for non-GPU layers. Default is n_cpu_cores / 2. Increase for pure CPU inference; keep default for GPU-dominant setups.

Custom MCP Servers #

Extend MBS Workbench's capabilities by adding your own MCP servers. MCP servers expose tools to the AI model via JSON-RPC 2.0 over stdio.

Creating a Custom Server

// my-mcp-server.js — Minimal MCP server example
const { Server } = require('@modelcontextprotocol/sdk/server');
const { StdioServerTransport } = require('@modelcontextprotocol/sdk/server/stdio');

const server = new Server({ name: 'my-custom-tools', version: '1.0.0' }, {
  capabilities: { tools: { listChanged: false } }
});

server.setRequestHandler('tools/list', async () => ({
  tools: [{
    name: 'get_weather',
    description: 'Get weather for a city',
    inputSchema: {
      type: 'object',
      properties: { city: { type: 'string', description: 'City name' } },
      required: ['city']
    }
  }]
}));

server.setRequestHandler('tools/call', async (request) => {
  if (request.params.name === 'get_weather') {
    const city = request.params.arguments.city;
    return { content: [{ type: 'text', text: `Weather in ${city}: 72°F, sunny` }] };
  }
});

const transport = new StdioServerTransport();
server.connect(transport);

Registering Your Server

Open MCP Manager (Ctrl+Shift+M), click Add Server, and configure:

FieldValue
Namemy-custom-tools
Commandnode
Args["./my-mcp-server.js"]
Transportstdio
Auto-StartOptional — start on app launch

Docker Deployment #

Deploy your project using Docker directly from MBS Workbench.

Generate a Dockerfile

Open Deploy → Docker/K8s. Click Generate Dockerfile. Select your framework (Node, Python, Rust, Go, Java, or Static). MBS generates an optimized multi-stage Dockerfile.

Build the Image

Click Build Image or run docker build -t myapp:latest . in the terminal. MBS shows real-time build progress with expandable layer details.

Run the Container

Click Run from the image list or use docker run -p 3000:3000 myapp:latest. Container logs stream in the Docker panel.

Manage & Monitor

Use the Docker dashboard to view running containers, inspect logs, exec into shells, and stop/restart containers — all without leaving the IDE.

Building a RAG Pipeline #

Use Retrieval-Augmented Generation (RAG) to give your AI model access to your project's documentation, codebase, or any document corpus.

Load an Embedding Model

Download an embedding model like nomic-embed-text-v1.5 from HuggingFace. Load it in the Embeddings panel.

Index Your Documents

Open Document Chat from the Activity Bar. Drag and drop files (PDF, TXT, MD, code files) into the panel. MBS chunks, embeds, and stores them in a local vector database.

Query with Context

Ask questions like "What does the authentication system do?" or "Find all API endpoints related to billing." The RAG pipeline retrieves the most relevant chunks and injects them into the LLM's context window.

Use @codebase in Chat

In the main AI chat, type @codebase to automatically search your indexed project. The agent receives semantic search results as context, enabling codebase-aware answers.

RAG Storage
All embeddings are stored locally in a SQLite-backed vector database. Nothing is sent to external services. Your indexed documents stay private on your machine.

API Reference #

Complete reference for MBS Workbench's internal APIs, Tauri commands, agent tools, and configuration schemas.

Tauri Commands (IPC) #

All communication between the React frontend and Rust backend happens via Tauri's IPC bridge. Each command is invoked with invoke('command_name', { args }).

LLM & Inference

CommandArgumentsReturnsDescription
load_modelpath: string, gpu_layers?: numberModelInfoLoad a GGUF model into memory with optional GPU layer override
unload_modelvoidRelease the loaded model from memory
chat_completionsmessages: Message[], params: InferenceParamsStream<string>Stream chat completions (SSE) from the loaded model
cancel_inferencevoidAbort the current inference stream
get_model_infoModelInfo | nullReturns metadata about the currently loaded model
detect_hardwareHardwareInfoReturns GPU, RAM, CPU details for the current system

File System

CommandArgumentsReturnsDescription
read_dirpath: string, recursive?: booleanFileEntry[]List directory contents with metadata
read_file_textpath: stringstringRead file contents as UTF-8 text
write_filepath: string, content: stringvoidWrite contents to a file (create or overwrite)
delete_pathpath: stringvoidDelete a file or directory
rename_pathfrom: string, to: stringvoidRename or move a file/directory
search_filesquery: string, path: string, regex?: booleanSearchResult[]Search file contents with optional regex

Project & Workspace

CommandArgumentsReturnsDescription
open_projectpath: stringProjectInfoOpen a folder as workspace, index files
get_project_infoProjectInfoGet current workspace path, file count, watchers
run_terminal_commandcommand: string, cwd?: stringCommandResultExecute a shell command and return stdout/stderr

Agent Tool API #

The autonomous agent has access to 23 tools organized into 7 categories. Each tool follows a strict JSON schema for input/output.

Tool Definitions

ToolCategorySafety TierInput Schema
read_fileFilesystemSafe (1){ path: string }
create_fileFilesystemSafe (1){ path: string, content: string }
edit_fileFilesystemElevated (2){ path: string, old: string, new: string }
delete_fileFilesystemDangerous (3){ path: string }
list_directoryFilesystemSafe (1){ path: string, recursive?: boolean }
search_filesFilesystemSafe (1){ pattern: string, path?: string }
batch_readFilesystemSafe (1){ paths: string[] }
execute_commandTerminalElevated (2){ command: string, cwd?: string }
run_pythonTerminalElevated (2){ code: string }
run_nodeTerminalElevated (2){ code: string }
web_searchWebDangerous (3){ query: string, count?: number }
fetch_urlWebDangerous (3){ url: string }
scrape_webpageWebDangerous (3){ url: string }
git_statusGitSafe (1){ path?: string }
git_diffGitSafe (1){ path?: string }
git_commitGitElevated (2){ message: string }
git_pushGitDangerous (3){ remote?: string, branch?: string }
git_logGitSafe (1){ count?: number }
analyze_codeAnalysisElevated (2){ path: string }
explain_codeAnalysisSafe (1){ code: string, language?: string }
suggest_refactorAnalysisSafe (1){ path: string }
query_sqliteDatabaseElevated (2){ db_path: string, query: string }
parse_csvDatabaseSafe (1){ path: string }

MCP Protocol #

The Model Context Protocol uses JSON-RPC 2.0 over stdio for communication between MBS and MCP servers.

Lifecycle

MBS Workbench             MCP Server
     │                         │
     │──── initialize ────────►│   Server starts, reports capabilities
     │◄─── initialized ────────│
     │                         │
     │──── tools/list ────────►│   List available tools
     │◄─── tools[] ────────────│
     │                         │
     │──── tools/call ────────►│   Execute a tool with arguments
     │◄─── result ─────────────│
     │                         │
     │──── shutdown ──────────►│   Graceful shutdown
     │◄─── ok ─────────────────│

Server Configuration Schema

{
  "name": "server-name",         // Unique identifier
  "command": "node",             // Executable to run
  "args": ["./server.js"],       // Command arguments
  "env": { "API_KEY": "..." },   // Environment variables (optional)
  "transport": "stdio",          // Transport type (stdio only currently)
  "auto_start": false,           // Start on app launch
  "health_check_interval": 30,   // Seconds between health pings
  "restart_on_crash": true,      // Auto-restart on server crash
  "max_restarts": 3              // Maximum restart attempts
}

Pre-Configured Servers (24 total)

DomainServersInstall Command
Dev ToolsRust Analyzer, Pyright, TypeScript, Clangd, Go, TexLabAuto-detected from PATH
BlockchainSolana, Ethereum, CoinMarketCap, DEX Screenernpx @mcp/solana
Game EnginesGodot, Unity, Unrealnpx @mcp/godot
Web DevPuppeteer, Vercel, Dockernpx @mcp/puppeteer
DatabasesPostgreSQL, SQLite, Redisnpx @mcp/postgres
MediaComfyUI, FFmpeg, ImageMagicknpx @mcp/comfyui
MobileFlutter, Androidnpx @mcp/flutter
UtilitiesFilesystem, Terminal, Git, HTTPBuilt-in (no install)

Settings Schema #

All app settings are stored locally and accessible via Ctrl+,. Here's the complete schema:

Inference Settings

KeyTypeDefaultDescription
temperaturenumber0.7Sampling temperature (0.0 – 2.0)
top_pnumber0.9Nucleus sampling threshold
top_knumber40Top-K sampling limit
repeat_penaltynumber1.1Repetition penalty factor
max_tokensnumber4096Maximum generation length
context_windownumberautoContext window size (auto = 75% GPU capacity)
gpu_layersnumberautoNumber of layers to offload to GPU
batch_sizenumber512Prompt processing batch size
threadsnumberautoCPU threads for inference

Appearance Settings

KeyTypeDefaultDescription
themestring"dark"Theme mode: "dark", "light", "high-contrast"
font_familystring"system-ui"UI font family
font_sizenumber13UI font size in pixels
editor_font_familystring"Cascadia Code"Editor font family
editor_font_sizenumber14Editor font size
editor_line_heightnumber1.6Editor line height multiplier
ui_scalenumber1.0Global UI scale factor
minimapbooleantrueShow editor minimap

Cloud Provider Settings

KeyTypeDefaultDescription
openai_api_keystring""OpenAI API key
anthropic_api_keystring""Anthropic API key
google_api_keystring""Google Gemini API key
default_providerstring"local"Default inference provider

Complete Keyboard Shortcuts #

Full list of all keyboard shortcuts in MBS Workbench.

General

Command PaletteCtrl+Shift+P
Toggle SidebarCtrl+B
Toggle TerminalCtrl+`
Zen ModeCtrl+K Z
New FileCtrl+N
Open FolderCtrl+Shift+O
Save FileCtrl+S
Close TabCtrl+W
SettingsCtrl+,
Reload AppCtrl+Shift+R

Navigation

Go to FileCtrl+P
Go to LineCtrl+G
Go to SymbolCtrl+Shift+G
Next TabCtrl+Tab
Previous TabCtrl+Shift+Tab
Move Line UpAlt+Up
Move Line DownAlt+Down
Duplicate LineShift+Alt+Down

Activity Bar Panels

ExplorerCtrl+Shift+E
SearchCtrl+Shift+F
AI & ModelsCtrl+Shift+A
ConnectionsCtrl+Shift+N
Debug & TestCtrl+Shift+D
DeployCtrl+Shift+Y
ExtensionsCtrl+Shift+X
MCP ManagerCtrl+Shift+M

Editor

Find in FileCtrl+F
ReplaceCtrl+H
Find in WorkspaceCtrl+Shift+F
Format DocumentShift+Alt+F
Toggle CommentCtrl+/
Block CommentShift+Alt+A
Select All OccurrencesCtrl+Shift+L
Multi-CursorAlt+Click

AI & Chat

Focus ChatCtrl+L
New ChatCtrl+Shift+L
Cancel GenerationEscape
Accept CompletionTab
Insert @fileType @
Send MessageEnter

CLI Reference #

MBS Workbench can be launched from the command line with optional arguments:

# Open MBS Workbench
mbs-workbench

# Open a specific folder
mbs-workbench /path/to/project

# Open a specific file
mbs-workbench /path/to/file.ts

# Start with a specific model loaded
mbs-workbench --model /path/to/model.gguf

# Start in CPU-only mode (skip GPU detection)
mbs-workbench --cpu-only

# Show version
mbs-workbench --version

Design Agent #

The Design Agent is a local AI-powered design assistant that generates complete page layouts, component designs, and brand-consistent UI from natural language prompts — all running on your GPU with no cloud dependency.

Natural Language Design

Describe what you want — "a dark-themed pricing page with 3 tiers" — and the Design Agent generates production-ready HTML/CSS/Tailwind output using your local LLM.

Brand-Aware Generation

The agent reads your brand tokens (colors, fonts, spacing) and weaves them into every design, ensuring brand consistency across all generated components.

Dynamic Token Budgets

Automatically calculates max output tokens from your model's context window (n_ctx()), so large designs don't get truncated.

Compressed Preambles

Brand context is injected as compact token-efficient preambles, leaving maximum context for the actual design generation.

Visual Canvas #

A full drag-and-drop visual editor integrated directly into the IDE. Build pages visually with real-time preview, then export clean HTML/CSS/React code.

Drag-and-Drop Builder

Place components on a 2D canvas with snap-to-grid alignment, resize handles, and layer ordering. No code required for layout design.

Live Code Sync

Every visual change syncs to clean, editable code in real-time. Switch between visual and code views seamlessly.

Responsive Breakpoints

Preview and design for desktop, tablet, and mobile breakpoints. The canvas adapts to show your design at each viewport size.

Component Library

Built-in library of common UI components — navbars, hero sections, cards, footers, forms, modals — ready to drag onto your canvas.

Brand & Tokens #

Define your brand identity once and use it everywhere. Brand tokens (colors, typography, spacing, border-radius) are injected into the Design Agent, Visual Canvas, Template Marketplace, and AI Copywriter for consistent output.

Token Editor

Visual editor for brand tokens — pick colors, set font stacks, define spacing scales, configure border-radius and shadow presets.

Export Formats

Export tokens as CSS custom properties, Tailwind config, SCSS variables, or JSON for use in any project.

Template Marketplace #

Browse and install 15 unique, production-ready templates spanning landing pages, dashboards, portfolios, e-commerce, blogs, and more. Each template ships in 8 color variants (120 total combinations) and adapts to your brand tokens automatically.

15 Unique Templates

SaaS Landing, Portfolio, Blog, E-Commerce, Dashboard, Documentation, Agency, Restaurant, Fitness, Real Estate, Event, Education, Medical, Travel, and Startup — each professionally designed.

8 Color Variants

Every template includes 8 curated color schemes: Default, Ocean, Sunset, Forest, Royal, Coral, Midnight, and Amber. One click to switch.

Brand Token Integration

Templates automatically adopt your brand tokens — colors, fonts, and spacing — so the output matches your brand from the start.

One-Click Install

Preview any template in the built-in live preview, then install directly into your project with clean, editable HTML/CSS/React code.

AI Copywriter #

Generate marketing copy, product descriptions, CTAs, headlines, and page content using your local LLM. The AI Copywriter respects your brand tone and outputs clean, usable text with a 512-token generation limit for concise, focused copy.

Copy Templates

Pre-built prompts for hero headlines, feature descriptions, testimonials, pricing blurbs, email subject lines, and social media posts.

Tone Control

Select tone presets — Professional, Casual, Technical, Playful, Luxury — or define a custom brand voice for consistent messaging.

SEO & Performance Analyzer #

Built-in SEO and performance analysis for your pages and projects. Checks meta tags, heading structure, image alt text, page speed metrics, and accessibility compliance — all locally, no third-party services.

Asset Manager #

Organize, optimize, and manage images, icons, fonts, and other assets directly in the IDE. Drag-and-drop upload, automatic image compression, SVG optimization, and sprite generation.

Voice Studio (TTS) #

A full text-to-speech studio built into the IDE. Synthesize speech locally using Kokoro-82M ONNX (22 neural voices, no internet required) or Windows SAPI as a zero-install fallback. Audio is decoded directly in the browser using a Blob URL — no external protocol configuration needed.

Kokoro ONNX Engine

22 neural voice profiles running on-device via the Kokoro-82M model. Download the model once (~82 MB) and get high-quality, expressive speech synthesis with zero latency after the first warmup. Speed and pitch controls included.

Windows SAPI Fallback

If the Kokoro model is not installed, Workbench automatically falls back to Windows SAPI via PowerShell — no setup required. Any Windows voice installed on the system is available instantly.

In-IDE Audio Playback

Generated WAV audio is read as binary, decoded to a Blob URL, and played back in-app without any server or file protocol workarounds. Preview, replay, and download audio files directly from Voice Studio.

ONNX Status Dashboard

The Voice Studio panel shows the Kokoro model install status, runtime availability, model size, recommended action, and direct download link — so you always know what's available on your machine.

Mobile Export #

Export your web projects to native mobile apps with one click. MBS supports Capacitor (iOS/Android native), PWA (Progressive Web App), and APK (Android Package) export targets. Projects export to Documents/MBS-Mobile-Exports.

Capacitor Export

Generate a full Capacitor project with native iOS and Android wrappers. Ready for Xcode or Android Studio.

PWA Export

Generate a Progressive Web App with service worker, manifest, offline support, and installability — works on any device.

APK Build

Build a standalone Android APK directly from the IDE. No need to install Android Studio separately.

Interactive SDK #

Build interactive forms, surveys, quizzes, and multi-step workflows with a visual form builder. The SDK supports conditional logic, field validation, file uploads, and webhook integrations — all rendered client-side with zero backend required.

Visual Form Builder

Drag-and-drop form fields — text inputs, selects, checkboxes, file uploads, date pickers, sliders — with real-time preview.

Conditional Logic

Show/hide fields, skip steps, or change validation based on user responses. Build complex multi-step flows without code.

Embed & Export

Export forms as standalone HTML, React components, or embed via iframe. Each form gets a unique formId for tracking.

Architecture #

MBS Workbench is built on Tauri + Rust + React — a modern architecture that delivers near-native performance in a compact, self-contained native app.

┌──────────────────────────────────────────────────────────────┐ │ MBS WORKBENCH v0.2.4 │ │ 158+ backend modules · 191 components │ ├──────────────────────────────────────────────────────────────┤ │ React 18 + TypeScript Frontend (191 components) │ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ Monaco │ AI Chat │ Terminal │ Activity Bar + Panels ││ │ │ Editor │ (Stream) │ (xterm) │ (13 sidebar panels) ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ │ Tauri IPC Bridge │ │ Rust Backend (158+ modules, in-process, zero-config) │ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ LLM │ Diffusion│ Training │ Inference ││ │ │ Engine │ Engine │ Pipeline │ Optimizer ││ │ │ (CUDA) │ (CUDA) │ (LoRA) │ (Auto-Quant) ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ ReAct │ MCP │ Docker / │ Cloud Providers ││ │ │ Agent │ Protocol │ K8s Mgmt │ (10 APIs) ││ │ │(23 tools)│(24 srvrs)│ │ ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ │ ┌──────────┬──────────┬───────────┬────────────────────────┐│ │ │ LSP/DAP │ Git2-rs │ Cloud │ Model Export ││ │ │ Client │ Native │ Deploy │ (ONNX/CoreML/TRT) ││ │ └──────────┴──────────┴───────────┴────────────────────────┘│ └──────────────────────────────────────────────────────────────┘ ↓ Direct GPU Access (no Docker, no server) ┌──────────────────────────────┐ │ NVIDIA CUDA / CPU │ │ (User's own hardware) │ └──────────────────────────────┘

Technology Stack

ComponentTechnologyWhy
Desktop FrameworkTauri 1.5 (Rust)600 KB runtime vs Electron's 100 MB. Native Windows/Mac/Linux.
LLM InferenceNative Rust engine with CUDA bindingsFastest GGUF inference. Direct CUDA. In-process — no server.
Image GenerationNative diffusion engine (C++/CUDA)GPU-accelerated diffusion. SD 1.5, SDXL, FLUX. In-process — no Python.
Model TrainingCustom LoRA/QLoRA pipelineHardware-aware training with advanced optimization. Consumer GPU support.
FrontendReact 18 + TypeScriptIndustry standard. Monaco Editor compatibility. Component ecosystem.
StylingTailwindCSS + CSS Custom PropsUtility-first with design token system. 100+ variables.
State ManagementZustandLightweight, TypeScript-native, minimal boilerplate.
Local Storagerusqlite (SQLite)Embedded database for settings, history, model metadata.
Terminalxterm.js + PTYFull terminal emulation with pseudo-terminal backend.
Preview ServerWarp (Rust)Lightweight HTTP server for real-time web preview.
Technical Moat
The entire backend — inference, diffusion engine, training pipeline, agent loop, tool execution, context management, MCP protocol, Docker/K8s integration, cloud deployment — is written in Rust. This provides significantly lower latency than Python-based alternatives and creates a codebase that is extremely difficult to replicate. Zero marginal cost per user — the user's hardware does all the work.

Performance Tuning #

Inference Optimizer (Zero Configuration)

MBS Workbench includes a built-in inference optimizer that automatically configures every loaded model for maximum speed on your hardware. No manual tuning required.

Auto-Quantization

Detects available VRAM and selects the optimal quantization level (Q4_K_M for <6 GB, Q5_K_M for 6-10 GB, Q8_0 for 10-16 GB, FP16 for 16+ GB). Cached conversions skip re-quantization on subsequent loads.

Intelligent Layer Partitioning

Automatically calculates how many layers fit in GPU VRAM and offloads the rest to CPU with pinned memory. Enables 30B+ models on 4 GB GPUs with mixed GPU/CPU inference.

Flash Attention & KV Cache

Flash Attention v3 enabled by default on supported GPUs (2-3× faster). 8-bit KV cache reduces attention memory by 75%, enabling 256K+ context windows on consumer hardware.

Model-Specific Presets

Hand-tuned optimization presets for popular model architectures. Auto-detected on load. MoE expert parallelism, vision-language encoder splitting, and diffusion-specific memory management.

Expected Performance
On a system with 4 GB VRAM + 64 GB RAM: 7B models at 40-70 tok/s, 30B MoE models at 25-40 tok/s, and image generation 30-40% faster than unoptimized loading. The optimizer delivers 10-20× speedup compared to naive FP16 inference.

Hardware-Aware Scaling

At launch, MBS detects your exact hardware configuration and scales every subsystem:

System TierRAMGPUOptimized For
Low8 GB0-2 GB VRAM2-4B models, CPU inference, minimal context
Medium16 GB4 GB VRAM7B models, partial GPU offload, 4K context
High32 GB8 GB VRAM13B models, full GPU offload, 8K context
Ultra64 GB+12 GB+ VRAM30B+ models, maximum context, batch decode

Memory Architecture

Three-tier memory system ensures optimal performance:

L1 — Hot (2 GB)

Active conversation context, current file buffer, model attention cache

L2 — Warm (8 GB)

Conversation history, recent files (last 10), KV cache, project embeddings

L3 — Cold (SSD)

Model weights (memory-mapped), project history, vector database, preferences

Privacy & Security #

Your data never leaves your machine. Ever.
MBS Workbench runs 100% locally. There is no telemetry, no analytics, no crash reporting, no account system, and no phone-home mechanism. The binary communicates with exactly two external services (both opt-in): HuggingFace for model downloads, and cloud providers if you configure API keys.

How MBS Compares #

FeatureMBS WorkbenchGitHub CopilotCursorLM Studio
PricingFree / One-time$10-39/mo$20/moFree
Runs Locally✓ Full✗ Cloud✗ Cloud✓ Inference
100% Private
Code Editor✓ Monaco✓ VS Code ext✓ Fork✗ None
Autonomous Agent✓ 23 toolsPartial
GPU Acceleration✓ Native CUDAN/AN/A
Model Choice✓ Any GGUFGPT-4 onlyGPT-4/Claude✓ Any GGUF
Source Control (Git)✓ Native git2-rs✓ Built-in✓ Built-in
Debug (DAP)✓ 9 adapters✓ Built-in✓ Built-in
Multi-Model Chat✓ Tournament
Voice Input (STT)✓ Web Speech API
Text-to-Speech (TTS)✓ Kokoro + SAPI
Episodic Memory✓ SQLite + vectors
Conversation Branching✓ Git-tree UI
Telegram Bot Mode✓ Local polling
Image Generation✓ SD local
Model Training✓ LoRA/QLoRA
HuggingFace Browser✓ Built-inPartial
MCP Protocol✓ 24 serversPartial
Docker / K8s✓ Full mgmt
Multi-Cloud Deploy✓ 5 providers
Cost Analytics
Offline Mode✓ Full
App Size~175 MBN/A~400 MB~200 MB
Competitive Advantage
MBS Workbench is the only product that combines local LLM inference + code editor + autonomous agent + MCP servers + model marketplace + image generation + on-device model training + Cloud GPU Hub + container management + multi-cloud deployment + native Git + DAP debugger + Voice Studio (TTS + STT) + Episodic Memory + Conversation Branching + Telegram Bot Mode + 10 cloud API providers + inference optimizer + multi-model tournament in a single binary. Competitors address 1–2 of these capabilities. We address all of them — at zero recurring cost to the user. 70 major feature releases. 250%+ AI capability vs cloud tools.

Frequently Asked Questions #

Is MBS Workbench free?

The core application is free. Future Pro features (advanced agent workflows, enterprise MCP servers, priority model access) may be offered as a one-time license — never a subscription.

What models work with MBS?

Any GGUF-format language model — Qwen, DeepSeek, Llama, Mistral, Phi, Gemma, CodeLlama, and thousands more from HuggingFace. For image generation, MBS supports Stable Diffusion 1.5, SDXL, and FLUX models. Plus, 10 cloud API providers (OpenAI, Anthropic, Google, etc.) are built in for hybrid workflows.

Do I need a GPU?

No. MBS works in CPU-only mode — just slower. For the best experience, an NVIDIA GPU with 4+ GB VRAM is recommended. Even a GTX 1660 (6 GB) provides a great experience with 7B models.

Does it send my code to the cloud?

Never. When using local models, your code and prompts never leave your machine. Cloud providers (OpenAI, Anthropic, etc.) are opt-in and clearly labeled in the UI.

How does it compare to Copilot/Cursor?

Copilot and Cursor are cloud-first tools with recurring subscriptions. MBS runs locally, costs nothing per month, and gives you complete model freedom. See the full comparison table.

Can I use it offline?

Yes. Once you've downloaded a model, everything works without internet — editing, AI completions, chat, agent, image generation, model training, terminal, and all development tools.

What about Mac and Linux?

Windows is the primary platform today. Mac and Linux builds are planned — Tauri natively supports all three platforms.