Local AI Ops

Securing Your Local Ollama API: Auth and Isolation

Securing Your Local Ollama API TL;DR By default, Ollama exposes its API on localhost:11434 without authentication, making it vulnerable if your network perimeter is breached or if you expose it for remote access. This guide shows you how to lock down your local Ollama deployment using reverse proxies, API keys, and network isolation techniques. ...

LM Studio vs Ollama: Complete Comparison for Local AI

LM Studio vs Ollama: Complete Comparison for Local AI TL;DR LM Studio and Ollama are both excellent tools for running LLMs locally, but they serve different use cases. LM Studio offers a polished GUI experience ideal for experimentation and interactive chat, while Ollama provides a streamlined CLI and API-first approach perfect for automation and production deployments. ...

How to Run Llama 3 Locally with Ollama on Linux

How to Run Llama 3 Locally with Ollama on Linux TL;DR Running Llama 3 locally with Ollama on Linux takes about 5 minutes from start to finish. You’ll install Ollama, pull the model, and start chatting—all without sending data to external servers. Quick Setup: curl -fsSL https://ollama.com/install.sh | sh # Pull Llama 3 (8B parameter version) ollama pull llama3 # Start chatting ollama run llama3 The 8B model requires ~5GB disk space and 8GB RAM. For the 70B version, you’ll need 40GB disk space and 48GB RAM minimum. Ollama handles quantization automatically, so you don’t need to configure GGUF formats manually. ...

Self-Hosting Open WebUI with Docker: Setup Guide

Self-Hosting Open WebUI with Docker TL;DR Open WebUI is a self-hosted web interface for running local LLMs through Ollama, providing a ChatGPT-like experience without cloud dependencies. This guide walks you through Docker-based deployment, configuration, and integration with local models. What you’ll accomplish: Deploy Open WebUI in under 10 minutes using Docker Compose, connect it to Ollama for model inference, configure authentication, and set up persistent storage for chat history and model configurations. ...

llama.cpp vs Ollama: Which Local LLM Runner Should You Use

llama.cpp vs Ollama: Which Local LLM Runner Should You Use TL;DR Ollama wins for most self-hosters who want their local LLM running in under 5 minutes. It handles model downloads, GPU acceleration, and exposes a clean OpenAI-compatible API at localhost:11434. Perfect for Docker Compose stacks with Open WebUI, and it integrates seamlessly with tools like Continue.dev for VSCode or n8n workflows. ...

How to Self-Host n8n with Docker: Complete Installation Guide

TL;DR Self-hosting n8n with Docker gives you complete control over your workflow automation infrastructure without vendor lock-in or usage limits. This guide walks you through installing n8n using Docker Compose, configuring persistent storage, setting up SSL with Traefik or Nginx Proxy Manager, and connecting to external databases like PostgreSQL. You’ll learn how to create a production-ready n8n instance with proper environment variables, secure your installation with Let’s Encrypt certificates, and configure backup strategies using tools like Restic or Duplicati. We cover essential security practices including firewall configuration with UFW, setting up fail2ban for brute-force protection, and implementing proper authentication methods. ...

Best Local LLMs for 8GB RAM: Llama, Mistral, Phi

Best Local LLMs for 8GB RAM: Llama, Mistral, Phi TL;DR Running local LLMs on 8GB RAM systems is entirely feasible in 2026, but requires careful model selection and quantization strategies. Llama 3.2 3B (Q4_K_M quantization) delivers the best balance of capability and efficiency, using approximately 2.3GB RAM while maintaining strong reasoning abilities. Mistral 7B (Q3_K_M) pushes boundaries at 3.8GB RAM, offering superior performance for coding tasks but requiring aggressive quantization. Phi-3 Mini (3.8B parameters, Q4_K_S) sits in the middle at 2.1GB, excelling at structured outputs and JSON generation. ...

Open WebUI vs Ollama Web UI: Choosing the Right One

Open WebUI vs Ollama Web UI: Choosing the Right One TL;DR Open WebUI (formerly Ollama WebUI) is the actively maintained, feature-rich choice for most users, while Ollama Web UI refers to the deprecated original project that’s no longer developed. Open WebUI offers a ChatGPT-like interface with multi-user support, RAG (Retrieval-Augmented Generation) for document chat, model management, conversation history, and plugin architecture. It runs as a Docker container or Python application, connecting to your local Ollama instance on port 11434. Perfect for teams, homelab setups, or anyone wanting a polished UI with authentication and persistent storage. ...

How to Install and Run Ollama on Debian Linux

How to Install and Run Ollama on Debian Linux TL;DR Ollama transforms your Debian system into a private AI inference server, letting you run models like Llama 3.1, Mistral, and Phi-3 locally without cloud dependencies. This guide walks you through installation, model deployment, API integration, and production hardening. Quick Install: curl -fsSL https://ollama.com/install.sh | sh sudo systemctl enable ollama ollama pull llama3.1:8b ollama run llama3.1:8b You’ll configure Ollama as a systemd service, expose its REST API on port 11434, and integrate it with Open WebUI for a ChatGPT-like interface. We cover GPU acceleration (NVIDIA/AMD), resource limits, and reverse proxy setup with Nginx for secure remote access. ...

Using LLMs to Generate Nginx Configuration

TL;DR LLMs excel at generating Nginx configurations from natural language requirements, but require strict validation workflows. This guide demonstrates using Claude 3.5 Sonnet and GPT-4 via API to produce production-ready configs, integrated with nginx -t validation and Ansible deployment pipelines. Core workflow: Describe your requirements in structured prompts, LLM generates config, automated syntax validation, manual security review, then deploy via configuration management. This reduces configuration time from hours to minutes while maintaining safety through validation gates. ...