Self-Host AnythingLLM: Complete Setup Guide with Ollama Integration

TL;DR

AnythingLLM provides a complete document management and chat interface for local LLMs, with native Ollama integration that keeps your data entirely on your infrastructure. This guide walks through deploying both services on a single Linux host, configuring secure communication between containers, and connecting your first model for document-based question answering.

The setup uses Docker Compose to orchestrate AnythingLLM and Ollama as separate services. Ollama runs on port 11434 serving models via REST API, while AnythingLLM connects to it as an LLM provider. You’ll configure persistent storage for both the AnythingLLM database and Ollama’s model cache, ensuring your workspace settings and downloaded models survive container restarts.

Key integration points include setting OLLAMA_HOST to make Ollama accessible from the AnythingLLM container, configuring OLLAMA_ORIGINS to allow cross-origin requests, and selecting models through AnythingLLM’s provider settings interface. The guide covers pulling models like llama3.2 or mistral using the Ollama CLI, then connecting AnythingLLM to your local Ollama instance rather than cloud APIs.

You’ll learn to create workspaces in AnythingLLM, upload documents for embedding, and configure retrieval-augmented generation workflows that query your documents using locally-hosted models. The setup includes volume mounts for document storage, environment variable configuration for network communication between services, and verification steps to confirm the integration works correctly.

Caution: Always review AI-generated Docker commands and configuration files before deploying to production. Validate that volume paths exist, ports don’t conflict with existing services, and environment variables match your network topology. Test document upload and chat functionality in a development environment first.

This approach eliminates external API dependencies while maintaining full control over model selection, document processing, and data retention. Most teams running this stack report improved response times for document queries compared to cloud-based solutions, with the added benefit of complete data privacy.

What is AnythingLLM and Why Self-Host It

AnythingLLM is a full-stack application that turns your documents into an interactive knowledge base powered by local language models. Unlike cloud-based solutions that send your data to third-party servers, AnythingLLM runs entirely on your infrastructure and connects to local LLM providers like Ollama.

The platform handles document ingestion, vector embedding, and conversational retrieval – essentially building a private ChatGPT that understands your specific documents. You can upload PDFs, text files, markdown documentation, and web scrapes, then query them using natural language through a web interface.

Self-hosting AnythingLLM gives you complete control over sensitive data. When you process internal documentation, customer records, or proprietary research, that information never leaves your network. This matters for organizations handling regulated data or teams working with confidential materials.

The architecture also eliminates recurring API costs. Once you have hardware capable of running models locally, your only expense is electricity. Teams running hundreds of queries daily find this approach more economical than metered cloud services.

Integration with Ollama

AnythingLLM connects to Ollama’s REST API on port 11434 to access local models. When you ask a question, AnythingLLM retrieves relevant document chunks from its vector database, then sends them to Ollama along with your query. The local model generates a response based on your actual documents rather than generic training data.

This integration means you can switch between models – llama3.2, mistral, or codellama – without changing your document processing pipeline. You simply pull a different model through Ollama and select it in AnythingLLM’s settings.

Caution: Always review AI-generated responses before using them in production systems. Local models can hallucinate or misinterpret document context, especially with complex technical material.

Prerequisites and System Requirements

Before deploying AnythingLLM with Ollama integration, verify your system meets the baseline requirements for running local language models effectively. Most homelab setups can handle smaller models, but larger deployments require careful resource planning.

Your system needs at least 8GB RAM for basic operation with 7B parameter models. For 13B models, plan for 16GB minimum. GPU acceleration dramatically improves inference speed – NVIDIA cards with 8GB VRAM handle most common models comfortably. AMD GPUs work through ROCm on Linux, though driver setup requires more effort.

Storage requirements vary by model collection size. A typical setup with three to five models consumes 20-40GB. Use SSD storage for the Ollama model directory to reduce load times.

Software Dependencies

Install Docker and Docker Compose for container orchestration. AnythingLLM runs as a containerized application, simplifying deployment and updates. Your Linux distribution should be relatively current – Ubuntu 22.04 LTS, Debian 12, or Fedora 38+ all work reliably.

Install Ollama on the host system before deploying AnythingLLM:

curl -fsSL https://ollama.com/install.sh | sh

Verify Ollama responds on port 11434:

curl http://localhost:11434/api/tags

Network Configuration

AnythingLLM communicates with Ollama over HTTP. If running both services in containers, configure Docker networking to allow container-to-host communication. The default bridge network works for most setups, but custom networks provide better isolation.

Reserve port 3001 for AnythingLLM’s web interface. Ensure your firewall allows inbound connections if accessing from other machines on your network.

Caution: When using AI-generated configuration snippets or deployment scripts, review all commands before execution. Verify environment variables match your actual setup, especially OLLAMA_HOST and OLLAMA_MODELS paths. Test configurations in non-production environments first.

Installing Ollama and Pulling Models

Ollama provides the LLM backend that AnythingLLM connects to for local inference. The installation process is straightforward on most Linux distributions.

Run the official install script:

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama as a systemd service that starts automatically on boot. Verify the service is running:

systemctl status ollama

The API server listens on port 11434 by default. Test connectivity:

curl http://localhost:11434/api/tags

Pulling Your First Models

Ollama pulls models from the ollama.com library. Start with a general-purpose model like Llama 3.1:

ollama pull llama3.1:8b

For coding tasks, Qwen2.5-Coder offers strong performance:

ollama pull qwen2.5-coder:7b

Smaller models like Phi-3 work well on systems with limited VRAM:

ollama pull phi3:mini

List installed models:

ollama list

Configuring GPU Access

If you have an NVIDIA GPU, set the number of GPUs Ollama should use:

sudo systemctl edit ollama

Add this override:

[Service]
Environment="OLLAMA_NUM_GPU=1"

Reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Testing Model Inference

Verify a model responds correctly:

ollama run llama3.1:8b "Explain Docker networking in one sentence"

The model should return a concise response. If you see connection errors, check that port 11434 is not blocked by your firewall.

Caution: When using AI-generated commands for system configuration, always review the output before applying changes to production environments. Test configuration changes on non-critical systems first.

Deploying AnythingLLM with Docker

Docker provides the fastest path to running AnythingLLM in production. The official container includes all dependencies and isolates the application from your host system. Start by creating a dedicated directory for persistent storage:

mkdir -p ~/anythingllm/storage
cd ~/anythingllm

Launch the container with volume mounts for data persistence:

docker run -d \
  --name anythingllm \
  -p 3001:3001 \
  -v ~/anythingllm/storage:/app/server/storage \
  -e STORAGE_DIR=/app/server/storage \
  mintplexlabs/anythingllm:latest

The web interface becomes available at http://localhost:3001 within 30 seconds. For Ollama integration, ensure your Ollama instance runs on the host network. If Ollama runs in another container, use Docker’s host.docker.internal DNS name:

docker run -d \
  --name anythingllm \
  -p 3001:3001 \
  -v ~/anythingllm/storage:/app/server/storage \
  -e STORAGE_DIR=/app/server/storage \
  --add-host=host.docker.internal:host-gateway \
  mintplexlabs/anythingllm:latest

Configure the Ollama endpoint in AnythingLLM settings as http://host.docker.internal:11434. This routing works on Linux systems with Docker 20.10 or newer.

Add restart policies for automatic recovery after system reboots:

docker update --restart unless-stopped anythingllm

Monitor container logs to verify successful startup and catch configuration errors:

docker logs -f anythingllm

Caution: When using AI assistants to generate Docker commands, always verify volume paths exist and port numbers match your network configuration. Incorrect volume mounts result in data loss between container restarts. Test your setup with non-critical data before migrating production workspaces.

For multi-user deployments, consider placing AnythingLLM behind a reverse proxy like Nginx or Caddy to add TLS termination and authentication layers.

Connecting AnythingLLM to Your Local Ollama Instance

Before connecting AnythingLLM, confirm Ollama is accessible on port 11434. Run this command from your terminal:

curl http://localhost:11434/api/tags

You should see JSON output listing your installed models. If the connection fails, check that Ollama is running with systemctl status ollama or restart it with sudo systemctl restart ollama.

Configure the LLM Provider

In the AnythingLLM web interface, navigate to Settings and select LLM Provider. Choose “Ollama” from the dropdown menu. Enter the base URL as http://localhost:11434 – this tells AnythingLLM where to find your local Ollama instance.

If you’re running AnythingLLM in Docker and Ollama on the host machine, use http://host.docker.internal:11434 instead. For separate machines on your network, replace localhost with the Ollama server’s IP address and ensure OLLAMA_ORIGINS is set to allow connections:

export OLLAMA_ORIGINS="http://192.168.1.100:3001"
sudo systemctl restart ollama

Select Your Model

After configuring the connection, AnythingLLM will query Ollama for available models. Select one from the dropdown – popular choices include llama3.1, mistral, or codellama depending on your use case. The model must already be pulled via ollama pull modelname before it appears in the list.

Test the connection by sending a simple prompt through the AnythingLLM chat interface. Response times depend on your hardware and model size. Larger models like llama3.1:70b require substantial RAM and benefit from GPU acceleration configured via OLLAMA_NUM_GPU.

Caution: When using AI-generated configuration commands or scripts, always review them manually before applying to production systems. Verify environment variables match your actual setup and test changes in a non-critical environment first.

Verification and Testing

Start by confirming Ollama responds on its default port. From your AnythingLLM host, run:

curl http://localhost:11434/api/tags

This returns a JSON list of available models. If you see a connection refused error, verify Ollama is running with systemctl status ollama and check that OLLAMA_HOST is set to 0.0.0.0:11434 if AnythingLLM runs in a separate container.

Next, test model inference directly through Ollama’s API:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain Docker networking in one sentence.",
  "stream": false
}'

A successful response contains a response field with generated text. If you receive a model not found error, pull it first with ollama pull llama3.2.

AnythingLLM Integration Check

Log into the AnythingLLM web interface and navigate to Settings – LLM Preference. Select Ollama as your provider and enter http://localhost:11434 as the base URL. Click the connection test button. AnythingLLM should display available models from your Ollama instance.

Create a test workspace and send a simple query like “What is the capital of France?” If responses appear slowly or time out, check Docker network configuration. Containers using bridge networking cannot reach localhost – use the host’s IP address or Docker’s host.docker.internal hostname instead.

Document Processing Validation

Upload a small text file to your workspace and ask AnythingLLM to summarize it. This tests the full pipeline: document ingestion, embedding generation, vector storage, and LLM response synthesis. Watch Ollama’s logs with journalctl -u ollama -f to confirm it receives embedding and completion requests.

Caution: Always review AI-generated summaries against source documents before relying on them for decisions. LLMs can hallucinate details not present in your data, especially with complex technical documentation.

TL;DR#

What is AnythingLLM and Why Self-Host It#

Integration with Ollama#

Prerequisites and System Requirements#

Software Dependencies#

Network Configuration#

Installing Ollama and Pulling Models#

Pulling Your First Models#

Configuring GPU Access#

Testing Model Inference#

Deploying AnythingLLM with Docker#

Connecting AnythingLLM to Your Local Ollama Instance#

Configure the LLM Provider#

Select Your Model#

Verification and Testing#

AnythingLLM Integration Check#

Document Processing Validation#