Jan AI: Complete Guide to Self-Hosting LLMs on Your Local Machine

TL;DR

Jan AI is an open-source desktop application that lets you run large language models entirely on your local machine—no cloud dependencies, no data leaving your network. Think of it as a polished alternative to Ollama with a ChatGPT-like interface built in.

What makes Jan different: Unlike command-line tools like llama.cpp or Ollama, Jan provides a complete GUI experience with conversation management, model switching, and system resource monitoring. It supports GGUF model formats and runs models from Llama 3.1, Mistral, Phi-3, and other popular families.

Quick setup overview: Download the AppImage for Linux, make it executable, and launch. Jan automatically detects your GPU (NVIDIA CUDA, AMD ROCm, or CPU fallback) and downloads models directly through its interface. First-time setup takes 5-10 minutes depending on your chosen model size.

Key capabilities you’ll learn:

Installing Jan on Ubuntu/Debian/Arch systems with proper GPU acceleration
Downloading and managing models (7B to 70B+ parameter ranges)
Configuring memory limits and context windows for optimal performance
Using Jan’s local API server (OpenAI-compatible endpoints)
Integrating Jan with development tools like Continue.dev and Cursor
Setting up system-wide shortcuts and background service operation

Hardware requirements: Minimum 16GB RAM for 7B models, 32GB+ recommended for 13B models. NVIDIA GPUs with 8GB+ VRAM provide 5-10x faster inference. Jan runs on CPU-only systems but expect slower response times.

Why self-host? Complete privacy (conversations never leave your machine), no API costs, offline operation, and full control over model behavior. Perfect for sensitive code review, internal documentation generation, or personal AI assistance without corporate data policies.

This guide covers everything from initial installation through advanced API integration, with real-world examples using Ansible for deployment automation and Prometheus for monitoring resource usage.

What is Jan AI and Why Choose It for Local LLM Hosting

Jan AI is an open-source desktop application that transforms your local machine into a private LLM server. Unlike cloud-based solutions, Jan runs entirely offline, giving you complete control over your data and eliminating API costs. Think of it as a polished alternative to Ollama with a ChatGPT-like interface built in.

Jan combines the simplicity of LM Studio with the flexibility of llama.cpp under the hood. It supports GGUF model formats, hardware acceleration via CUDA and Metal, and provides both a desktop UI and local API server on http://localhost:1337. This makes it ideal for developers who want a user-friendly interface while maintaining programmatic access.

The platform excels at model management—you can download models directly from HuggingFace, switch between them instantly, and adjust parameters like temperature and context length without editing configuration files. For homelab operators, Jan’s OpenAI-compatible API means you can integrate it with existing tools like Continue.dev, Cursor, or custom Python scripts:

import openai

client = openai.OpenAI(
    base_url="http://localhost:1337/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[{"role": "user", "content": "Explain Docker networking"}]
)
print(response.choices[0].message.content)

⚠️ Caution: When using Jan’s API to generate system commands or infrastructure code, always review outputs before execution. LLMs can hallucinate package names, file paths, or dangerous command combinations. Validate generated Ansible playbooks, Terraform configurations, and bash scripts in a test environment first.

Jan’s privacy-first approach makes it particularly valuable for handling sensitive codebases, internal documentation, or proprietary data that cannot leave your infrastructure. No telemetry, no cloud dependencies—just local AI inference.

System Requirements and Hardware Considerations

Running Jan AI locally demands careful hardware planning. Unlike cloud-based solutions, your machine becomes the inference engine, making specifications critical for acceptable performance.

For basic 7B parameter models, you’ll need at least 16GB RAM and a modern CPU (Intel i5-10th gen or AMD Ryzen 5 5600X equivalent). However, 32GB RAM with a dedicated GPU transforms the experience. An NVIDIA RTX 3060 (12GB VRAM) handles most 13B models comfortably, while RTX 4090 or A6000 cards enable 70B+ models with quantization.

GPU Acceleration Essentials

Jan AI leverages CUDA for NVIDIA GPUs and Metal for Apple Silicon. Install CUDA Toolkit 12.1+ for optimal performance:

# Verify CUDA installation
nvidia-smi
nvcc --version

# Check GPU memory availability
nvidia-smi --query-gpu=memory.free --format=csv

Apple M1/M2/M3 users benefit from unified memory architecture—a Mac Studio with 64GB handles 30B models efficiently without discrete GPUs.

Storage and Model Management

Allocate 50-100GB for model storage. Quantized GGUF models save space: a Q4_K_M quantized Llama 3.1 70B requires ~40GB versus 140GB for full precision. Use fast NVMe SSDs to reduce model loading times from minutes to seconds.

# Monitor disk usage for Jan's model directory
du -sh ~/.jan/models/

Performance Benchmarking

Test inference speed before committing to specific models:

# Example prompt timing with Jan CLI
time jan run llama3.1:8b "Explain Docker networking in 50 words"

Caution: When using AI assistants to generate hardware recommendations or system optimization commands, always validate suggestions against official Jan AI documentation. LLMs may hallucinate outdated CUDA versions or incompatible driver combinations that could destabilize your system. Cross-reference any apt install or pip install commands with current package repositories before execution.

Pre-Installation: Preparing Your Linux Environment

Before installing Jan AI, you need to verify your system meets the requirements and prepare your Linux environment. This preparation prevents common installation failures and ensures optimal performance.

First, verify your hardware specifications. Jan AI requires at least 8GB RAM for 7B models, 16GB for 13B models, and 32GB+ for 30B+ models. Check your available resources:

free -h
lscpu | grep -E "Model name|Socket|Core|Thread"
nvidia-smi  # For NVIDIA GPU users

For GPU acceleration, ensure you have CUDA 11.8+ or ROCm 5.7+ installed. Verify CUDA installation:

nvcc --version
nvidia-smi --query-gpu=compute_cap --format=csv

Installing Dependencies

Jan AI requires Node.js 18+ and several system libraries. Install them using your distribution’s package manager:

# Ubuntu/Debian
sudo apt update
sudo apt install -y curl wget git build-essential

# Fedora/RHEL
sudo dnf install -y curl wget git gcc-c++ make

# Arch Linux
sudo pacman -S curl wget git base-devel

Install Node.js 20 LTS using the official repository:

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
node --version  # Should show v20.x.x

Storage Preparation

Jan AI stores models in ~/.jan/models by default. Create a dedicated directory with sufficient space (50GB+ recommended):

mkdir -p ~/.jan/models
df -h ~  # Verify available disk space

For systems with limited home directory space, create a symlink to a larger partition:

sudo mkdir -p /mnt/data/jan-models
ln -s /mnt/data/jan-models ~/.jan/models

Caution: If using AI assistants like Claude or ChatGPT to generate system commands, always validate the output before execution. AI models can hallucinate package names or suggest commands incompatible with your specific distribution version. Test commands in a non-production environment first.

Installing Jan AI on Linux

Jan AI offers native Linux packages that integrate seamlessly with your system. The installation process is straightforward, with multiple distribution options available.

The AppImage format provides universal compatibility across Linux distributions without requiring root privileges:

# Download the latest Jan AI AppImage
wget https://github.com/janhq/jan/releases/latest/download/jan-linux-x86_64.AppImage

# Make it executable
chmod +x jan-linux-x86_64.AppImage

# Run Jan AI
./jan-linux-x86_64.AppImage

For system-wide access, move the AppImage to /opt and create a desktop entry:

sudo mv jan-linux-x86_64.AppImage /opt/jan.AppImage
sudo ln -s /opt/jan.AppImage /usr/local/bin/jan

Debian/Ubuntu Installation

For Debian-based systems, use the official .deb package:

# Download the .deb package
wget https://github.com/janhq/jan/releases/latest/download/jan-linux-amd64.deb

# Install with apt
sudo apt install ./jan-linux-amd64.deb

# Launch Jan AI
jan

Fedora/RHEL Installation

Red Hat-based distributions can use the .rpm package:

# Download the RPM package
wget https://github.com/janhq/jan/releases/latest/download/jan-linux-x86_64.rpm

# Install with dnf
sudo dnf install ./jan-linux-x86_64.rpm

Post-Installation Configuration

Jan AI stores models and configuration in ~/.jan by default. For homelab setups with limited home directory space, create a symlink to a larger storage volume:

# Stop Jan AI if running
pkill jan

# Move data directory to larger storage
mv ~/.jan /mnt/storage/jan-data
ln -s /mnt/storage/jan-data ~/.jan

Caution: Jan AI’s model download feature uses AI-assisted search. Always verify model sources and checksums before downloading, as AI recommendations may occasionally suggest deprecated or unofficial model variants.

Initial Configuration and Model Management

After installing Jan AI, launch the application and you’ll be greeted with a clean interface for managing your local LLM infrastructure. The first critical step is downloading models that match your hardware capabilities and use cases.

Navigate to the “Hub” section where Jan provides curated model recommendations. For most users starting out, I recommend:

# Models are stored in ~/.jan/models/ by default
# Check available disk space first
df -h ~/.jan

Start with Llama 3.2 3B Instruct (2GB) for testing on modest hardware, or Mistral 7B Instruct (4.1GB) if you have 16GB+ RAM. Click “Download” and Jan handles the model acquisition automatically.

Model Configuration and Performance Tuning

Once downloaded, access model settings through the gear icon. Key parameters to adjust:

# Context length affects memory usage exponentially
context_length: 4096  # Start conservative
n_gpu_layers: 35      # Offload layers to GPU if available
temperature: 0.7      # Lower for factual tasks

Caution: AI assistants like ChatGPT often suggest aggressive GPU layer counts. Always validate recommendations against your actual VRAM using nvidia-smi before applying settings that could crash your system.

Testing Model Performance

Create a test conversation to verify functionality:

# Example API call to local Jan instance
import requests

response = requests.post('http://localhost:1337/v1/chat/completions', 
    json={
        "model": "mistral-7b-instruct",
        "messages": [{"role": "user", "content": "Explain Docker in one sentence"}]
    })
print(response.json()['choices'][0]['message']['content'])

Monitor resource usage with htop during inference. If you see excessive swap usage or OOM errors, reduce context_length or switch to a smaller quantized model (Q4_K_M variants use ~40% less memory).

Verification and Testing

After installing Jan AI, verify your setup is working correctly before deploying models or building integrations.

Test the Jan API server responds on the default port:

curl http://localhost:1337/v1/models

You should receive a JSON response listing available models. If the connection fails, check Jan is running with systemctl status jan or verify the process in your task manager.

Model Loading Test

Download a small model like Phi-3-mini (3.8GB) to verify model management works:

curl -X POST http://localhost:1337/v1/models/download \
  -H "Content-Type: application/json" \
  -d '{"model": "phi-3-mini"}'

Monitor download progress in the Jan UI or check disk usage with du -sh ~/.jan/models/.

Inference Validation

Send a test prompt to confirm the model generates responses:

import requests

response = requests.post(
    "http://localhost:1337/v1/chat/completions",
    json={
        "model": "phi-3-mini",
        "messages": [{"role": "user", "content": "What is 2+2?"}],
        "temperature": 0.7
    }
)

print(response.json()["choices"][0]["message"]["content"])

⚠️ AI Hallucination Warning: When using Jan to generate system commands or infrastructure code, always validate outputs before execution. LLMs can produce syntactically correct but functionally dangerous commands. Never pipe AI-generated scripts directly to bash or sudo without manual review.

Performance Baseline

Measure tokens per second for your hardware:

time curl -X POST http://localhost:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "phi-3-mini", "messages": [{"role": "user", "content": "Count to 100"}]}'

Expect 20-50 tokens/second on modern CPUs, 80-150 tokens/second with GPU acceleration. Document these baselines for troubleshooting performance degradation later.

TL;DR#

What is Jan AI and Why Choose It for Local LLM Hosting#

System Requirements and Hardware Considerations#

GPU Acceleration Essentials#

Storage and Model Management#

Performance Benchmarking#

Pre-Installation: Preparing Your Linux Environment#

Installing Dependencies#

Storage Preparation#

Installing Jan AI on Linux#

Debian/Ubuntu Installation#

Fedora/RHEL Installation#

Post-Installation Configuration#

Initial Configuration and Model Management#

Model Configuration and Performance Tuning#

Testing Model Performance#

Verification and Testing#

Model Loading Test#

Inference Validation#

Performance Baseline#