Running Local LLMs on AMD GPUs with ROCm and Ollama

TL;DR

AMD GPUs are a viable alternative to NVIDIA for local LLM inference, particularly the RX 7900 XTX with 24GB VRAM. ROCm 6.x on Linux provides the software stack needed to run Ollama and llama.cpp with GPU acceleration. Performance is 15-30% lower than equivalent NVIDIA hardware, but AMD cards often cost significantly less.

Key facts:

  • Best AMD card for local AI: RX 7900 XTX (24GB VRAM, ~$650-800 used)
  • Software required: ROCm 6.x on Ubuntu 22.04/24.04 or Debian 12
  • Performance: ~70-85% of RTX 3090 for LLM inference
  • Supported models: Same as NVIDIA — any model that fits in VRAM
  • Main tradeoff: Lower price but less mature software ecosystem

Cost comparison at the 24GB tier:

GPUVRAMUsed PriceInference Speed (8B Q4)
RX 7900 XTX24GB$650-800~38 tok/s
RTX 309024GB$700-900~50 tok/s
RTX 409024GB$1,500-1,800~80 tok/s

Supported AMD GPUs

ROCm 6.x officially supports a specific set of AMD GPUs. Not every AMD card works.

Officially Supported (RDNA 3)

GPUVRAMROCm SupportAI Suitability
RX 7900 XTX24GBFullBest consumer AMD option
RX 7900 XT20GBFullGood, but 20GB is an awkward tier
RX 7900 GRE16GBFullBudget option for 7B-13B models
RX 7800 XT16GBPartialWorks with HSA_OVERRIDE
RX 7700 XT12GBPartialLimited to smaller models

Officially Supported (CDNA / Data Center)

GPUVRAMROCm SupportNotes
MI250X128GB HBM2eFullData center, expensive
MI21064GB HBM2eFullData center
MI10032GB HBM2FullOlder, available used

Community / Unsupported (May Work)

GPUVRAMStatus
RX 6900 XT16GBWorks with HSA_OVERRIDE_GFX_VERSION=10.3.0
RX 6800 XT16GBWorks with override, some instability
RX 6700 XT12GBPartial, known issues

Cards older than RDNA 2 (RX 5000 series and earlier) are not supported and will not work with ROCm.

Installing ROCm 6.x on Debian/Ubuntu

Prerequisites

Update your system and install kernel headers:

sudo apt update && sudo apt upgrade -y
sudo apt install -y linux-headers-$(uname -r) wget gnupg2

Add the AMD ROCm Repository

For Ubuntu 22.04:

# Add the ROCm GPG key
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

# Add the repository
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ jammy main" | \
  sudo tee /etc/apt/sources.list.d/rocm.list

# Pin ROCm packages to the AMD repository
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
  sudo tee /etc/apt/preferences.d/rocm-pin-600

For Ubuntu 24.04:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ noble main" | \
  sudo tee /etc/apt/sources.list.d/rocm.list

echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
  sudo tee /etc/apt/preferences.d/rocm-pin-600

For Debian 12:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ bookworm main" | \
  sudo tee /etc/apt/sources.list.d/rocm.list

echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
  sudo tee /etc/apt/preferences.d/rocm-pin-600

Install ROCm

sudo apt update
sudo apt install -y rocm-hip-runtime rocm-hip-sdk rocm-opencl-runtime

This installs the HIP runtime, development SDK, and OpenCL support. The full rocm meta-package is larger and includes tools you may not need.

Add Your User to the Required Groups

sudo usermod -aG render,video $USER

Log out and back in for group changes to take effect.

Verify the Installation

# Check ROCm version
cat /opt/rocm/.info/version

# List detected GPUs
rocm-smi

# Verify HIP runtime
/opt/rocm/bin/hipconfig --full

rocm-smi should display your GPU with temperature, utilization, and memory information. If it shows no devices, your GPU may not be supported or the driver installation failed.

Environment Variables

Add ROCm to your PATH:

echo 'export PATH=$PATH:/opt/rocm/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib' >> ~/.bashrc
source ~/.bashrc

For unsupported GPUs (RDNA 2), set the GFX version override:

# Only for RX 6900 XT, 6800 XT, etc. — NOT needed for RX 7900 series
export HSA_OVERRIDE_GFX_VERSION=10.3.0

To make the override permanent for Ollama via systemd, create an override file:

sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
EOF
sudo systemctl daemon-reload

Installing Ollama with ROCm Support

Ollama ships with ROCm support built in. The standard installation script detects AMD GPUs automatically:

curl -fsSL https://ollama.ai/install.sh | sh

After installation, verify Ollama detects your AMD GPU:

ollama run llama3.1:8b "Hello, what GPU am I running on?"

While the model runs, check GPU utilization in another terminal:

rocm-smi

You should see non-zero GPU utilization and VRAM usage. If VRAM shows 0MB used while Ollama is running a model, it is falling back to CPU.

Docker with ROCm

To run Ollama in Docker with AMD GPU support:

docker run -d \
  --name ollama \
  --device /dev/kfd \
  --device /dev/dri \
  --group-add video \
  --group-add render \
  -v ollama-data:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama:rocm

Note the :rocm tag. The default Ollama Docker image includes only CUDA support. The ROCm variant is a separate image.

Systemd Service Configuration

If you installed Ollama via the install script, it creates a systemd service automatically. To customize it for AMD:

sudo systemctl edit ollama

Add environment variables as needed:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/mnt/storage/ollama-models"
# Uncomment for RDNA 2 GPUs:
# Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"

Restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Performance Benchmarks: AMD vs NVIDIA

All benchmarks run with Ollama, Q4_K_M quantization, default context length, single-user inference.

7B Models (Llama 3.1 8B)

                        RX 7900 XTX     RTX 3090        RTX 4090
Prompt eval:            ~600 tok/s      ~800 tok/s      ~1,400 tok/s
Token generation:       ~38 tok/s       ~50 tok/s       ~80 tok/s
Time to first token:    ~200ms          ~150ms          ~90ms
VRAM used:              ~5GB            ~5GB            ~5GB

The RX 7900 XTX reaches about 76% of the RTX 3090’s generation speed. Perfectly usable for interactive chat.

13B Models (CodeLlama 13B)

                        RX 7900 XTX     RTX 3090        RTX 4090
Prompt eval:            ~350 tok/s      ~500 tok/s      ~900 tok/s
Token generation:       ~20 tok/s       ~25 tok/s       ~42 tok/s
Time to first token:    ~350ms          ~250ms          ~140ms
VRAM used:              ~8GB            ~8GB            ~8GB

At 13B, the AMD card is about 80% of the 3090’s speed. The gap narrows slightly at larger model sizes because memory bandwidth becomes more dominant.

34B Models (CodeLlama 34B, Yi 34B)

                        RX 7900 XTX     RTX 3090        RTX 4090
Prompt eval:            ~180 tok/s      ~250 tok/s      ~450 tok/s
Token generation:       ~12 tok/s       ~15 tok/s       ~24 tok/s
Time to first token:    ~550ms          ~400ms          ~250ms
VRAM used:              ~20GB           ~20GB           ~20GB

34B models fit in 24GB with Q4 quantization on both the XTX and 3090. At this size, the performance gap is roughly 20%.

Mixtral 8x7B (MoE)

                        RX 7900 XTX     RTX 3090
Prompt eval:            ~250 tok/s      ~350 tok/s
Token generation:       ~18 tok/s       ~22 tok/s
VRAM used:              ~26GB (tight)   ~26GB (tight)

Mixtral barely fits in 24GB with Q4 quantization. Performance is similar between AMD and NVIDIA at this memory-constrained level.

Known Issues and Workarounds

Issue: Ollama Falls Back to CPU

Symptom: rocm-smi shows 0% GPU utilization while Ollama runs a model.

Fix: Ensure your user is in the render and video groups:

groups $USER
# Should include: render video

If not:

sudo usermod -aG render,video $USER
# Log out and back in

Issue: ROCm Not Detecting GPU

Symptom: rocm-smi shows no devices.

Fix: Check if the amdgpu kernel module is loaded:

lsmod | grep amdgpu

If not present, check dmesg for driver errors:

dmesg | grep -i amdgpu

Common cause: secure boot is enabled and the unsigned amdgpu module cannot load. Either disable secure boot in BIOS or sign the kernel module.

Issue: Out of Memory on 24GB Card

Symptom: Model fails to load with memory allocation error.

Fix: ROCm reserves some VRAM for system use. Available VRAM is typically 22-23GB out of 24GB. If your model needs exactly 24GB, try a smaller quantization:

# Use Q3_K_M instead of Q4_K_M to save ~2GB
ollama run codellama:34b-instruct-q3_K_M

Issue: Kernel Panic or System Freeze Under Load

Symptom: System becomes unresponsive during sustained GPU workload.

Fix: This is often a power delivery issue. AMD GPUs can spike above their rated TDP briefly. Ensure your PSU has sufficient headroom (750W+ for a single 7900 XTX system).

You can also cap power consumption:

# Limit GPU to 250W (default is 300W for 7900 XTX)
rocm-smi --setpoweroverdrive 250

Issue: Slow Performance Compared to Expected

Symptom: Inference is significantly slower than benchmarks suggest.

Fix: Check that the GPU is not thermal throttling:

watch -n 1 rocm-smi

Junction temperature should stay below 100C. If it exceeds that, improve case airflow or increase fan speed:

# Set fan speed to 80%
rocm-smi --setfan 80

Cost Comparison: AMD vs NVIDIA

The primary reason to consider AMD is cost savings at the same VRAM tier.

24GB Tier

RX 7900 XTXRTX 3090RTX 4090
Used price$650-800$700-900$1,500-1,800
VRAM24GB24GB24GB
8B inference38 tok/s50 tok/s80 tok/s
Price per tok/s$18-21$14-18$19-23
Power draw300W350W450W

At the 24GB tier, the RTX 3090 actually offers better value per token/second than either the XTX or 4090. The AMD card is cheaper upfront but slower.

16GB Tier

RX 7900 GRERX 7800 XTRTX 4060 Ti 16GB
Used price$400-500$350-450$350-450
VRAM16GB16GB16GB
8B inference~30 tok/s~25 tok/s~35 tok/s

At 16GB, AMD and NVIDIA are closer in both price and performance. The RTX 4060 Ti 16GB slightly edges out the AMD options.

When AMD Wins on Cost

  • You can find an RX 7900 XTX under $700. At that price, it undercuts most 3090 listings while matching VRAM.
  • You already own an AMD GPU. Upgrading within the AMD ecosystem avoids buying new hardware entirely.
  • You want new hardware with warranty. New 7900 XTX cards are still available at retail, while 3090s are used-only.

When NVIDIA Wins on Cost

  • You value software compatibility. Every AI framework supports CUDA. ROCm support is growing but not universal.
  • You plan to use Docker. NVIDIA Container Toolkit is more mature than AMD’s Docker GPU passthrough.
  • You want multi-GPU. Tensor parallelism across AMD GPUs is less tested and documented than NVIDIA.

Building an AMD AI Server

ComponentRecommendationPrice
GPURX 7900 XTX (24GB)$700
CPUAMD Ryzen 5 5600$130
MotherboardB550 ATX$100
RAM64GB DDR4-3200$120
PSU850W Gold$100
Storage1TB NVMe SSD$70
CaseMid-tower with good airflow$80
Total$1,300

This build gives you 24GB VRAM for running models up to 34B quantized, with enough system RAM for comfortable operation.

Installation Checklist

  1. Install Debian 12 or Ubuntu 22.04/24.04
  2. Update kernel and install headers
  3. Install ROCm 6.x from AMD repository
  4. Add user to render and video groups
  5. Verify GPU detection with rocm-smi
  6. Install Ollama (auto-detects ROCm)
  7. Pull a model and test: ollama run llama3.1:8b
  8. Verify GPU utilization with rocm-smi

Bottom Line

AMD GPUs are a legitimate option for local LLM inference in 2026. The RX 7900 XTX at 24GB VRAM can run the same models as an RTX 3090, at 70-85% of the speed, often for less money. ROCm 6.x has matured significantly, and Ollama’s built-in ROCm support makes setup straightforward on supported hardware.

The tradeoff is real: NVIDIA has better software support, more community resources, and slightly higher performance. If you are comfortable with Linux and willing to troubleshoot occasional ROCm quirks, AMD is a cost-effective path to local AI. If you want the smoothest experience with the widest compatibility, NVIDIA remains the default choice.


FAQ

Can I use any AMD GPU for local LLM inference?

No. ROCm officially supports only specific AMD GPUs based on the RDNA 3 and CDNA architectures. The RX 7900 XTX, RX 7900 XT, and RX 7900 GRE are the primary consumer options. Older RDNA 2 cards like the RX 6900 XT have limited community support through environment variable overrides but are not officially supported and may have stability issues.

Is AMD GPU performance comparable to NVIDIA for LLM inference?

AMD GPUs are roughly 15-30% slower than NVIDIA equivalents at the same VRAM tier for LLM inference. The RX 7900 XTX with 24GB VRAM performs at roughly 70-85% of an RTX 3090’s inference speed depending on the model. The gap is due to less mature software optimization in ROCm compared to CUDA, not hardware limitations.

Does Ollama automatically detect AMD GPUs?

Yes, if ROCm is properly installed. Ollama checks for ROCm at startup and uses AMD GPUs when available. Run ollama run llama3.1 and check GPU utilization with rocm-smi to verify. If Ollama falls back to CPU, check that your GPU is in the supported list and ROCm drivers are correctly installed.

Can I mix AMD and NVIDIA GPUs in the same system for AI?

Not practically. ROCm and CUDA are separate software stacks, and most AI frameworks including Ollama use one or the other per process. You cannot split a model across an AMD and NVIDIA GPU. You could theoretically run separate Ollama instances on each GPU, but this adds complexity with minimal benefit.

Should I buy AMD or NVIDIA for a new local AI build?

For maximum compatibility and performance, NVIDIA is still the safer choice in 2026. CUDA has broader software support and better optimization for AI workloads. Buy AMD if the price difference is significant (the RX 7900 XTX is often $200-300 cheaper than an RTX 3090 for the same 24GB VRAM), you are comfortable with Linux troubleshooting, and you accept potentially lower inference speeds.