TL;DR

Running Ollama on Windows requires different considerations than Linux deployments. You have two main paths: native Windows installation or WSL2. Native Windows offers simpler GPU access through NVIDIA CUDA or AMD ROCm drivers, while WSL2 provides a Linux-like environment but adds complexity for GPU passthrough.

The native Windows installer downloads from ollama.com and runs as a system service. After installation, Ollama serves models on port 11434 and appears in your system tray. Windows Defender Firewall blocks external connections by default – you must create an inbound rule for port 11434 if accessing from other machines on your network.

GPU acceleration on Windows depends on your hardware. NVIDIA users need CUDA Toolkit 11.8 or newer. AMD users require ROCm drivers, though support remains limited compared to NVIDIA. Intel Arc GPUs work through oneAPI, but performance varies significantly by model generation. Without GPU acceleration, expect slower inference times, especially with models larger than 7B parameters.

WSL2 offers advantages for developers familiar with Linux tooling but introduces GPU driver complications. You need Windows 11 or Windows 10 with specific builds, plus WSL2-compatible GPU drivers from your vendor. The Linux installation script works inside WSL2, but networking requires additional configuration to expose port 11434 to Windows applications.

Environment variables control Ollama behavior. Set OLLAMA_HOST to bind to specific interfaces, OLLAMA_MODELS to change the model storage location (default is C:\Users\YourName.ollama\models), and OLLAMA_NUM_GPU to limit GPU usage. These variables go in Windows System Environment Variables for native installations or your .bashrc for WSL2.

Common pitfalls include antivirus software blocking model downloads, insufficient disk space for large models, and Windows power settings throttling GPU performance during inference. Always verify your GPU drivers match your Ollama installation method before troubleshooting performance issues.

Windows-Specific Considerations for Local AI

Running Ollama on Windows presents distinct challenges compared to Linux deployments. Windows users must choose between native Windows installation and WSL2, each with different performance characteristics and configuration requirements.

Native Windows installations run Ollama as a Windows service, integrating directly with NVIDIA or AMD GPU drivers through DirectML or CUDA. This approach provides better GPU utilization for most consumer hardware but requires Windows-specific driver configurations. WSL2 offers a Linux-like environment but adds virtualization overhead and requires GPU passthrough configuration through the Windows Subsystem for Linux.

For native Windows, Ollama installs to C:\Users\YourUsername\AppData\Local\Programs\Ollama and stores models in C:\Users\YourUsername\.ollama\models by default. Override this location by setting the OLLAMA_MODELS environment variable through System Properties > Environment Variables before starting the service.

Windows Firewall Configuration

Windows Defender Firewall blocks inbound connections to port 11434 by default. Create an inbound rule through Windows Security > Firewall & network protection > Advanced settings. Add a new rule for TCP port 11434, allowing connections from your local network range if accessing Ollama from other machines.

netsh advfirewall firewall add rule name="Ollama API" dir=in action=allow protocol=TCP localport=11434

GPU Driver Requirements

NVIDIA GPUs require CUDA Toolkit 11.8 or newer and driver version 522.06 or later. AMD GPUs need ROCm support through DirectML on Windows. Verify GPU detection after installation:

ollama run llama3.2:1b

If Ollama falls back to CPU inference, check Device Manager for driver warnings and reinstall GPU drivers with clean installation options selected. Set OLLAMA_NUM_GPU to limit GPU memory allocation if running other GPU-intensive applications simultaneously.

GPU Driver Setup and CUDA Prerequisites

Running Ollama on Windows with GPU acceleration requires proper NVIDIA driver installation and CUDA toolkit configuration. Unlike Linux distributions where GPU support often works out of the box, Windows requires explicit driver management and environment validation.

Download the latest Game Ready or Studio drivers directly from NVIDIA’s website rather than relying on Windows Update. The Windows Update drivers frequently lag behind current releases and may lack optimizations for AI workloads. After installation, verify your GPU is recognized by opening PowerShell and running:

nvidia-smi

This command displays your GPU model, driver version, and CUDA version. Note the CUDA version shown – Ollama requires CUDA 11.8 or newer for optimal performance.

CUDA Toolkit Considerations

Ollama for Windows bundles necessary CUDA runtime libraries, so installing the full CUDA Toolkit is optional for most users. However, if you plan to compile custom models or use other AI tools alongside Ollama, install the CUDA Toolkit matching your driver’s supported version.

Download the toolkit from NVIDIA’s developer portal and run the network installer. Choose the Express installation option unless you need specific components. After installation, verify CUDA is accessible:

nvcc --version

Windows-Specific GPU Configuration

Windows Defender Firewall may block Ollama’s GPU memory allocation on first run. When launching Ollama, allow access through both private and public networks if prompted. For systems with multiple GPUs, Ollama automatically selects the most capable device. To override this behavior, set the OLLAMA_NUM_GPU environment variable before starting the service:

$env:OLLAMA_NUM_GPU = "1"
ollama serve

Caution: Always verify GPU driver compatibility with your specific hardware before installation. Incorrect drivers can cause system instability or prevent Windows from booting properly.

Native Windows Installation Process

Navigate to ollama.com/download and download the official Windows installer (OllamaSetup.exe). The installer is a standard MSI package that handles service registration and PATH configuration automatically. Run the installer with administrator privileges to ensure proper system integration.

During installation, Ollama registers itself as a Windows service that starts automatically on boot. The service runs under the SYSTEM account by default, which affects where models are stored and how environment variables are configured.

Post-Installation Verification

Open PowerShell and verify the installation:

ollama --version
ollama serve

The serve command starts the API server on port 11434. Open a second PowerShell window and test model pulling:

ollama pull llama3.2:3b
ollama run llama3.2:3b "Explain how Windows services work"

Configuring Environment Variables

Windows environment variables for Ollama require system-level configuration since the service runs as SYSTEM. Open System Properties (Win+Pause, then “Advanced system settings”) and add environment variables under “System variables” not “User variables”:

  • OLLAMA_HOST: Set to 0.0.0.0:11434 to allow network access
  • OLLAMA_MODELS: Change default model storage location (default: C:\Users<user>.ollama\models)
  • OLLAMA_NUM_GPU: Control GPU layer offloading (integer value)
  • OLLAMA_ORIGINS: Configure CORS for web applications

After setting variables, restart the Ollama service:

Restart-Service Ollama

Windows Firewall Configuration

The installer does not automatically create firewall rules. To expose Ollama to your local network, create an inbound rule:

New-NetFirewallRule -DisplayName "Ollama API" -Direction Inbound -Protocol TCP -LocalPort 11434 -Action Allow

Test remote access from another machine using curl or your browser at http://:11434/api/tags.

WSL2 Alternative Installation Path

WSL2 provides a Linux environment within Windows that many developers prefer for running Ollama. This approach gives you the Linux installation experience while maintaining access to Windows applications and file systems.

First, ensure WSL2 is installed with Ubuntu or Debian. Open your WSL2 terminal and run the standard Linux installation command:

curl -fsSL https://ollama.com/install.sh | sh

The installer detects your WSL2 environment and configures Ollama to run as a background service. After installation completes, verify the service status:

systemctl --user status ollama

GPU Access Considerations

WSL2 supports GPU passthrough for NVIDIA cards through CUDA, but requires specific Windows driver versions. Install the NVIDIA CUDA toolkit inside WSL2:

sudo apt update
sudo apt install nvidia-cuda-toolkit

Verify GPU detection with nvidia-smi inside WSL2. If your GPU appears, Ollama automatically uses it for inference. Set the OLLAMA_NUM_GPU environment variable to control GPU allocation:

export OLLAMA_NUM_GPU=1

Accessing Ollama from Windows

By default, Ollama binds to localhost inside WSL2, making it inaccessible from Windows applications. Configure network access by setting OLLAMA_HOST before starting the service:

export OLLAMA_HOST=0.0.0.0:11434
systemctl --user restart ollama

From Windows PowerShell, access the API using your WSL2 IP address. Find it with wsl hostname -I from PowerShell, then test connectivity:

curl http://172.x.x.x:11434/api/tags

Caution: Exposing Ollama on 0.0.0.0 allows any network client to access your models. For production deployments, configure OLLAMA_ORIGINS to restrict access to specific domains or implement reverse proxy authentication. Always validate AI-generated configuration commands against official documentation before applying them to networked services.

Installation and Configuration Steps

Download the official Ollama installer from ollama.com/download/windows. The MSI package handles service registration and PATH configuration automatically. Run the installer with administrator privileges to ensure proper system integration.

After installation, verify Ollama is running by opening PowerShell and executing:

ollama --version

The service starts automatically and listens on port 11434. Test the REST API endpoint:

curl http://localhost:11434/api/tags

GPU Driver Configuration

Windows requires current NVIDIA or AMD drivers for GPU acceleration. Open Device Manager and verify your GPU appears without warning icons. For NVIDIA cards, install the latest Game Ready or Studio drivers from nvidia.com. AMD users need Adrenalin drivers from amd.com.

Set the OLLAMA_NUM_GPU environment variable to control GPU allocation. Open System Properties, navigate to Environment Variables, and add:

Variable: OLLAMA_NUM_GPU
Value: 1

Restart the Ollama service after changing environment variables through Services.msc or PowerShell:

Restart-Service Ollama

Windows Firewall Rules

Windows Defender Firewall blocks external connections by default. Create an inbound rule if you need network access from other machines:

New-NetFirewallRule -DisplayName "Ollama API" -Direction Inbound -LocalPort 11434 -Protocol TCP -Action Allow

For local-only access, no firewall changes are needed. The default configuration restricts connections to localhost.

Model Storage Location

Ollama stores models in C:\Users\YourUsername\.ollama\models by default. Change this location using the OLLAMA_MODELS environment variable if you need models on a different drive with more space. Point it to any accessible directory path.

Caution: Always verify AI-generated PowerShell commands in a test environment before running them with elevated privileges on production systems.

Verification and Testing

After installation completes, verify Ollama is running correctly on your Windows system. Open PowerShell or Command Prompt and check the service status:

ollama --version

This confirms the CLI is accessible. Next, test the REST API endpoint that applications will use to communicate with your models:

curl http://localhost:11434/api/tags

You should receive a JSON response listing available models. If you see a connection error, check Windows Defender Firewall settings – the installer should create an inbound rule for port 11434, but corporate security policies sometimes block it.

Download a small model to verify GPU acceleration and model loading:

ollama pull llama3.2:1b
ollama run llama3.2:1b "Explain what a REST API is in one sentence"

Watch for GPU utilization in Task Manager under the Performance tab. If you see high CPU usage but zero GPU activity, your NVIDIA or AMD drivers may need updating, or the model might be too small to benefit from GPU offloading.

Test API Integration

Verify programmatic access for applications like Open WebUI or custom scripts:

import requests

response = requests.post('http://localhost:11434/api/generate',
    json={
        'model': 'llama3.2:1b',
        'prompt': 'List three benefits of local AI deployment',
        'stream': False
    })

print(response.json()['response'])

Caution: When using AI models to generate PowerShell commands or system configurations, always review output before execution. Models can hallucinate invalid syntax or suggest commands that conflict with Windows security policies.

If the API returns connection refused errors, check that no other service is bound to port 11434 using netstat -ano | findstr 11434 in Command Prompt.