Tabby: Self-Hosted Code Completion with Local Models

TL;DR

# Run Tabby with NVIDIA GPU using Docker
docker run -d --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda

# Verify it is running
curl http://localhost:8080/v1/health

# Test a completion
curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"prompt": "def fibonacci(n):\n    ", "language": "python"}'

Install the Tabby plugin in your IDE, point it at http://localhost:8080, and get Copilot-style completions backed entirely by local hardware.

What Is Tabby

Tabby is an open-source, self-hosted code completion server. It acts as a drop-in replacement for GitHub Copilot’s backend: your IDE sends code context to Tabby, Tabby runs inference on a local model, and returns completion suggestions. The key differentiator from other tools is that Tabby is specifically designed as a server – it handles model management, request queuing, and repository indexing out of the box.

Tabby is built in Rust, which gives it low overhead and fast startup times. It ships as a single binary or Docker image, supports NVIDIA and Apple Silicon GPUs, and provides IDE plugins for VS Code, JetBrains, Vim, and Neovim.

Installation

Docker (Recommended)

Docker is the fastest path to a working installation. For NVIDIA GPUs:

docker run -d --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda

For Apple Silicon Macs:

docker run -d --name tabby \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device metal

For CPU-only (slow, but works for testing):

docker run -d --name tabby \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cpu

The $HOME/.tabby volume persists downloaded models and configuration between container restarts.

Binary Installation

Download the binary from the Tabby GitHub releases page:

# Linux x86_64 with CUDA
curl -L https://github.com/TabbyML/tabby/releases/latest/download/tabby_x86_64-manylinux2014-cuda -o tabby
chmod +x tabby
sudo mv tabby /usr/local/bin/

# Start the server
tabby serve --model StarCoder-1B --device cuda

For a persistent service, create a systemd unit:

# /etc/systemd/system/tabby.service
[Unit]
Description=Tabby Code Completion Server
After=network.target

[Service]
Type=simple
User=tabby
ExecStart=/usr/local/bin/tabby serve --model StarCoder-1B --device cuda
Restart=always
RestartSec=10
Environment="TABBY_ROOT=/var/lib/tabby"

[Install]
WantedBy=multi-user.target

sudo useradd -r -s /bin/false tabby
sudo mkdir -p /var/lib/tabby
sudo chown tabby:tabby /var/lib/tabby
sudo systemctl daemon-reload
sudo systemctl enable --now tabby

Supported Models

Tabby supports a curated list of models optimized for code completion. Unlike Ollama, you do not pull arbitrary models – you specify a model identifier and Tabby downloads it automatically on first run.

Model	Parameters	VRAM Required	Languages	Notes
StarCoder-1B	1B	~2 GB	80+ languages	Fast, good for tab completion
StarCoder-3B	3B	~4 GB	80+ languages	Better quality, still fast
StarCoder-7B	7B	~8 GB	80+ languages	Best StarCoder quality
CodeLlama-7B	7B	~8 GB	Multiple	Strong on Python, C++
CodeLlama-13B	13B	~16 GB	Multiple	High quality, needs large GPU
DeepseekCoder-1.3B	1.3B	~2 GB	Multiple	Good accuracy for size
DeepseekCoder-6.7B	6.7B	~8 GB	Multiple	Strong all-around
Qwen2.5-Coder-1.5B	1.5B	~2 GB	Multiple	Newest, competitive with 3B models

To switch models, stop the server and restart with a different --model flag:

tabby serve --model DeepseekCoder-6.7B --device cuda

Tabby downloads the model on first use and caches it in the data directory.

GPU Requirements

Code completion must return results fast – under 500ms for a good experience. This constrains your hardware choices.

GPU	VRAM	Recommended Max Model	Approximate Latency
RTX 3060	12 GB	7B	~200ms
RTX 3090 / 4090	24 GB	13B	~150ms
RTX 4060 Ti	16 GB	7B	~150ms
A100 40GB	40 GB	13B	~80ms
M1/M2 Pro 16GB	16 GB (unified)	7B	~250ms
M3 Max 96GB	96 GB (unified)	13B	~200ms
CPU only	N/A	1B	~2000ms+

For a small team (2-5 developers), a single RTX 3090 running StarCoder-3B handles concurrent requests well. For larger teams, run multiple instances behind a load balancer or use a larger GPU.

IDE Plugins

VS Code

Install from the marketplace: search for “Tabby” by TabbyML. Open settings and configure:

{
  "tabby.api.endpoint": "http://localhost:8080",
  "tabby.api.authToken": ""
}

If you set up authentication (recommended for team deployments), add the token here.

JetBrains (IntelliJ, PyCharm, GoLand, etc.)

Settings > Plugins > Marketplace > search “Tabby”. After installation:

Settings > Tools > Tabby > Server endpoint: http://localhost:8080

Vim / Neovim

Tabby provides a Vim plugin via its official repository:

" vim-plug
Plug 'TabbyML/vim-tabby'

" Configuration
let g:tabby_server_url = 'http://localhost:8080'

For Neovim with lazy.nvim:

{
  "TabbyML/vim-tabby",
  config = function()
    vim.g.tabby_server_url = "http://localhost:8080"
  end,
}

Repository Indexing

One of Tabby’s strongest features is repository indexing. Tabby can index your Git repositories and use that context when generating completions. This means suggestions are aware of your project’s types, function signatures, and patterns – not just the current file.

Configure repositories in ~/.tabby/config.toml:

[[repositories]]
name = "my-project"
git_url = "file:///home/user/projects/my-project"

[[repositories]]
name = "shared-lib"
git_url = "https://github.com/org/shared-lib.git"

After adding repositories, trigger indexing:

tabby scheduler --now

Tabby builds a code search index that it queries during completion. This is particularly valuable for:

Using project-specific types and interfaces in suggestions
Following existing code patterns and naming conventions
Referencing functions from other modules in the same project

Security note: If you index repositories via HTTPS URLs, Tabby clones them locally. Ensure your data directory ($HOME/.tabby or /var/lib/tabby) has appropriate permissions. For private repositories, use SSH URLs or local file paths.

Authentication and Team Deployment

For team use, enable authentication:

# Create an admin token
tabby serve --model StarCoder-3B --device cuda --token <your-secret-token>

Distribute the token to team members for their IDE configurations. For production team deployments, place Tabby behind a reverse proxy:

# /etc/nginx/sites-available/tabby
server {
    listen 443 ssl;
    server_name tabby.internal.company.com;

    ssl_certificate /etc/ssl/certs/tabby.pem;
    ssl_certificate_key /etc/ssl/private/tabby.key;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 300s;
    }
}

Security note: Never expose Tabby directly to the public internet. It processes your source code and should be treated as a sensitive internal service. Use VPN, mTLS, or a private network.

Comparison with Continue.dev and Copilot

Feature	Tabby	Continue.dev	GitHub Copilot
Architecture	Dedicated server	IDE extension	Cloud service
Model management	Built-in	Relies on Ollama/etc	Managed
Repository indexing	Built-in	Basic	Strong
Chat interface	Limited	Full chat + inline edit	Full chat
Team support	Multi-user, auth	Single user	Organization plans
Offline	Yes	Yes (with Ollama)	No
Cost	Free + hardware	Free + hardware	$10-19/month
IDE support	VS Code, JetBrains, Vim	VS Code, JetBrains	VS Code, JetBrains, Vim
Setup complexity	Low	Medium	Minimal

When to Choose Tabby

Tabby is the right choice when:

You need a team solution. Tabby’s server architecture means one GPU server serves multiple developers. Continue.dev runs per-machine.
Repository-aware completions matter. Tabby’s built-in indexing is more mature than Continue’s codebase search.
You want a focused tool. Tabby does code completion well and does not try to be a general-purpose LLM chat interface.
Operational simplicity. One binary, one Docker container. No separate model server needed.

Continue.dev is better when:

You want chat, inline editing, and completions in one tool.
You already run Ollama and want to reuse it.
You want to use the same models for coding and general tasks.

Monitoring and Maintenance

Tabby exposes metrics at its health endpoint:

curl http://localhost:8080/v1/health

For long-running deployments, monitor:

Disk usage in the data directory (models and indexes grow over time)
GPU memory with nvidia-smi – ensure no OOM conditions
Response latency – if completions slow down, the model may be too large for your hardware

Update Tabby by pulling the latest Docker image or downloading the newest binary:

docker pull tabbyml/tabby
docker stop tabby && docker rm tabby
# Re-run the docker run command

Model data persists in the mounted volume, so updates are non-destructive.

Troubleshooting

No completions in IDE: Verify the server is running (curl http://localhost:8080/v1/health). Check the IDE plugin is configured with the correct endpoint URL.

Slow completions: Check GPU utilization with nvidia-smi. If GPU is maxed, use a smaller model or upgrade hardware. CPU inference is not practical for interactive use beyond 1B models.

Docker GPU not detected: Ensure nvidia-container-toolkit is installed and the Docker daemon is configured to use the NVIDIA runtime. Test with docker run --gpus all nvidia/cuda:12.0-base nvidia-smi.

Model download fails: Check network connectivity and disk space. Models range from 1-15 GB. Tabby stores them in the data directory.

TL;DR#

What Is Tabby#

Installation#

Docker (Recommended)#

Binary Installation#

Supported Models#

GPU Requirements#

IDE Plugins#

VS Code#

JetBrains (IntelliJ, PyCharm, GoLand, etc.)#

Vim / Neovim#

Repository Indexing#

Authentication and Team Deployment#

Comparison with Continue.dev and Copilot#

When to Choose Tabby#

Monitoring and Maintenance#

Troubleshooting#

Related Local AI Guides

Essential llama.cpp Command Line Flags for Local AI in 2026

TL;DR

How to Move Ollama Models to Another Drive in 2026

TL;DR

Odysseus: Complete Self-Hosted AI Workspace with Ollama

TL;DR

DeepSeek v4 Local Setup Guide: Ollama and Open WebUI Install

TL;DR

Running Local AI Models on Kubernetes with Ollama in 2026

TL;DR

Self-Hosted AI Image Generation with Stable Diffusion in

TL;DR

TL;DR

What Is Tabby

Installation

Docker (Recommended)

Binary Installation

Supported Models

GPU Requirements

IDE Plugins

VS Code

JetBrains (IntelliJ, PyCharm, GoLand, etc.)

Vim / Neovim

Repository Indexing

Authentication and Team Deployment

Comparison with Continue.dev and Copilot

When to Choose Tabby

Monitoring and Maintenance

Troubleshooting