Running Local LLMs on AMD GPUs with ROCm and Ollama

TL;DR

AMD GPUs are a viable alternative to NVIDIA for local LLM inference, particularly the RX 7900 XTX with 24GB VRAM. ROCm 6.x on Linux provides the software stack needed to run Ollama and llama.cpp with GPU acceleration. Performance is 15-30% lower than equivalent NVIDIA hardware, but AMD cards often cost significantly less.

Key facts:

Best AMD card for local AI: RX 7900 XTX (24GB VRAM, ~$650-800 used)
Software required: ROCm 6.x on Ubuntu 22.04/24.04 or Debian 12
Performance: ~70-85% of RTX 3090 for LLM inference
Supported models: Same as NVIDIA — any model that fits in VRAM
Main tradeoff: Lower price but less mature software ecosystem

Cost comparison at the 24GB tier:

GPU	VRAM	Used Price	Inference Speed (8B Q4)
RX 7900 XTX	24GB	$650-800	~38 tok/s
RTX 3090	24GB	$700-900	~50 tok/s
RTX 4090	24GB	$1,500-1,800	~80 tok/s

Supported AMD GPUs

ROCm 6.x officially supports a specific set of AMD GPUs. Not every AMD card works.

Officially Supported (RDNA 3)

GPU	VRAM	ROCm Support	AI Suitability
RX 7900 XTX	24GB	Full	Best consumer AMD option
RX 7900 XT	20GB	Full	Good, but 20GB is an awkward tier
RX 7900 GRE	16GB	Full	Budget option for 7B-13B models
RX 7800 XT	16GB	Partial	Works with HSA_OVERRIDE
RX 7700 XT	12GB	Partial	Limited to smaller models

Officially Supported (CDNA / Data Center)

GPU	VRAM	ROCm Support	Notes
MI250X	128GB HBM2e	Full	Data center, expensive
MI210	64GB HBM2e	Full	Data center
MI100	32GB HBM2	Full	Older, available used

Community / Unsupported (May Work)

GPU	VRAM	Status
RX 6900 XT	16GB	Works with `HSA_OVERRIDE_GFX_VERSION=10.3.0`
RX 6800 XT	16GB	Works with override, some instability
RX 6700 XT	12GB	Partial, known issues

Cards older than RDNA 2 (RX 5000 series and earlier) are not supported and will not work with ROCm.

Installing ROCm 6.x on Debian/Ubuntu

Prerequisites

Update your system and install kernel headers:

sudo apt update && sudo apt upgrade -y
sudo apt install -y linux-headers-$(uname -r) wget gnupg2

Add the AMD ROCm Repository

For Ubuntu 22.04:

# Add the ROCm GPG key
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

# Add the repository
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ jammy main" | \
  sudo tee /etc/apt/sources.list.d/rocm.list

# Pin ROCm packages to the AMD repository
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
  sudo tee /etc/apt/preferences.d/rocm-pin-600

For Ubuntu 24.04:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ noble main" | \
  sudo tee /etc/apt/sources.list.d/rocm.list

echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
  sudo tee /etc/apt/preferences.d/rocm-pin-600

For Debian 12:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ bookworm main" | \
  sudo tee /etc/apt/sources.list.d/rocm.list

echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
  sudo tee /etc/apt/preferences.d/rocm-pin-600

Install ROCm

sudo apt update
sudo apt install -y rocm-hip-runtime rocm-hip-sdk rocm-opencl-runtime

This installs the HIP runtime, development SDK, and OpenCL support. The full rocm meta-package is larger and includes tools you may not need.

Add Your User to the Required Groups

sudo usermod -aG render,video $USER

Log out and back in for group changes to take effect.

Verify the Installation

# Check ROCm version
cat /opt/rocm/.info/version

# List detected GPUs
rocm-smi

# Verify HIP runtime
/opt/rocm/bin/hipconfig --full

rocm-smi should display your GPU with temperature, utilization, and memory information. If it shows no devices, your GPU may not be supported or the driver installation failed.

Environment Variables

Add ROCm to your PATH:

echo 'export PATH=$PATH:/opt/rocm/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib' >> ~/.bashrc
source ~/.bashrc

For unsupported GPUs (RDNA 2), set the GFX version override:

# Only for RX 6900 XT, 6800 XT, etc. — NOT needed for RX 7900 series
export HSA_OVERRIDE_GFX_VERSION=10.3.0

To make the override permanent for Ollama via systemd, create an override file:

sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
EOF
sudo systemctl daemon-reload

Installing Ollama with ROCm Support

Ollama ships with ROCm support built in. The standard installation script detects AMD GPUs automatically:

curl -fsSL https://ollama.ai/install.sh | sh

After installation, verify Ollama detects your AMD GPU:

ollama run llama3.1:8b "Hello, what GPU am I running on?"

While the model runs, check GPU utilization in another terminal:

rocm-smi

You should see non-zero GPU utilization and VRAM usage. If VRAM shows 0MB used while Ollama is running a model, it is falling back to CPU.

Docker with ROCm

To run Ollama in Docker with AMD GPU support:

docker run -d \
  --name ollama \
  --device /dev/kfd \
  --device /dev/dri \
  --group-add video \
  --group-add render \
  -v ollama-data:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama:rocm

Note the :rocm tag. The default Ollama Docker image includes only CUDA support. The ROCm variant is a separate image.

Systemd Service Configuration

If you installed Ollama via the install script, it creates a systemd service automatically. To customize it for AMD:

sudo systemctl edit ollama

Add environment variables as needed:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/mnt/storage/ollama-models"
# Uncomment for RDNA 2 GPUs:
# Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"

Restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Performance Benchmarks: AMD vs NVIDIA

All benchmarks run with Ollama, Q4_K_M quantization, default context length, single-user inference.

7B Models (Llama 3.1 8B)

                        RX 7900 XTX     RTX 3090        RTX 4090
Prompt eval:            ~600 tok/s      ~800 tok/s      ~1,400 tok/s
Token generation:       ~38 tok/s       ~50 tok/s       ~80 tok/s
Time to first token:    ~200ms          ~150ms          ~90ms
VRAM used:              ~5GB            ~5GB            ~5GB

The RX 7900 XTX reaches about 76% of the RTX 3090’s generation speed. Perfectly usable for interactive chat.

13B Models (CodeLlama 13B)

                        RX 7900 XTX     RTX 3090        RTX 4090
Prompt eval:            ~350 tok/s      ~500 tok/s      ~900 tok/s
Token generation:       ~20 tok/s       ~25 tok/s       ~42 tok/s
Time to first token:    ~350ms          ~250ms          ~140ms
VRAM used:              ~8GB            ~8GB            ~8GB

At 13B, the AMD card is about 80% of the 3090’s speed. The gap narrows slightly at larger model sizes because memory bandwidth becomes more dominant.

34B Models (CodeLlama 34B, Yi 34B)

                        RX 7900 XTX     RTX 3090        RTX 4090
Prompt eval:            ~180 tok/s      ~250 tok/s      ~450 tok/s
Token generation:       ~12 tok/s       ~15 tok/s       ~24 tok/s
Time to first token:    ~550ms          ~400ms          ~250ms
VRAM used:              ~20GB           ~20GB           ~20GB

34B models fit in 24GB with Q4 quantization on both the XTX and 3090. At this size, the performance gap is roughly 20%.

Mixtral 8x7B (MoE)

                        RX 7900 XTX     RTX 3090
Prompt eval:            ~250 tok/s      ~350 tok/s
Token generation:       ~18 tok/s       ~22 tok/s
VRAM used:              ~26GB (tight)   ~26GB (tight)

Mixtral barely fits in 24GB with Q4 quantization. Performance is similar between AMD and NVIDIA at this memory-constrained level.

Known Issues and Workarounds

Issue: Ollama Falls Back to CPU

Symptom: rocm-smi shows 0% GPU utilization while Ollama runs a model.

Fix: Ensure your user is in the render and video groups:

groups $USER
# Should include: render video

If not:

sudo usermod -aG render,video $USER
# Log out and back in

Issue: ROCm Not Detecting GPU

Symptom: rocm-smi shows no devices.

Fix: Check if the amdgpu kernel module is loaded:

lsmod | grep amdgpu

If not present, check dmesg for driver errors:

dmesg | grep -i amdgpu

Common cause: secure boot is enabled and the unsigned amdgpu module cannot load. Either disable secure boot in BIOS or sign the kernel module.

Issue: Out of Memory on 24GB Card

Symptom: Model fails to load with memory allocation error.

Fix: ROCm reserves some VRAM for system use. Available VRAM is typically 22-23GB out of 24GB. If your model needs exactly 24GB, try a smaller quantization:

# Use Q3_K_M instead of Q4_K_M to save ~2GB
ollama run codellama:34b-instruct-q3_K_M

Issue: Kernel Panic or System Freeze Under Load

Symptom: System becomes unresponsive during sustained GPU workload.

Fix: This is often a power delivery issue. AMD GPUs can spike above their rated TDP briefly. Ensure your PSU has sufficient headroom (750W+ for a single 7900 XTX system).

You can also cap power consumption:

# Limit GPU to 250W (default is 300W for 7900 XTX)
rocm-smi --setpoweroverdrive 250

Issue: Slow Performance Compared to Expected

Symptom: Inference is significantly slower than benchmarks suggest.

Fix: Check that the GPU is not thermal throttling:

watch -n 1 rocm-smi

Junction temperature should stay below 100C. If it exceeds that, improve case airflow or increase fan speed:

# Set fan speed to 80%
rocm-smi --setfan 80

Cost Comparison: AMD vs NVIDIA

The primary reason to consider AMD is cost savings at the same VRAM tier.

24GB Tier

	RX 7900 XTX	RTX 3090	RTX 4090
Used price	$650-800	$700-900	$1,500-1,800
VRAM	24GB	24GB	24GB
8B inference	38 tok/s	50 tok/s	80 tok/s
Price per tok/s	$18-21	$14-18	$19-23
Power draw	300W	350W	450W

At the 24GB tier, the RTX 3090 actually offers better value per token/second than either the XTX or 4090. The AMD card is cheaper upfront but slower.

16GB Tier

	RX 7900 GRE	RX 7800 XT	RTX 4060 Ti 16GB
Used price	$400-500	$350-450	$350-450
VRAM	16GB	16GB	16GB
8B inference	~30 tok/s	~25 tok/s	~35 tok/s

At 16GB, AMD and NVIDIA are closer in both price and performance. The RTX 4060 Ti 16GB slightly edges out the AMD options.

When AMD Wins on Cost

You can find an RX 7900 XTX under $700. At that price, it undercuts most 3090 listings while matching VRAM.
You already own an AMD GPU. Upgrading within the AMD ecosystem avoids buying new hardware entirely.
You want new hardware with warranty. New 7900 XTX cards are still available at retail, while 3090s are used-only.

When NVIDIA Wins on Cost

You value software compatibility. Every AI framework supports CUDA. ROCm support is growing but not universal.
You plan to use Docker. NVIDIA Container Toolkit is more mature than AMD’s Docker GPU passthrough.
You want multi-GPU. Tensor parallelism across AMD GPUs is less tested and documented than NVIDIA.

Building an AMD AI Server

Recommended Build: RX 7900 XTX Workstation

Component	Recommendation	Price
GPU	RX 7900 XTX (24GB)	$700
CPU	AMD Ryzen 5 5600	$130
Motherboard	B550 ATX	$100
RAM	64GB DDR4-3200	$120
PSU	850W Gold	$100
Storage	1TB NVMe SSD	$70
Case	Mid-tower with good airflow	$80
Total		$1,300

This build gives you 24GB VRAM for running models up to 34B quantized, with enough system RAM for comfortable operation.

Installation Checklist

Install Debian 12 or Ubuntu 22.04/24.04
Update kernel and install headers
Install ROCm 6.x from AMD repository
Add user to render and video groups
Verify GPU detection with rocm-smi
Install Ollama (auto-detects ROCm)
Pull a model and test: ollama run llama3.1:8b
Verify GPU utilization with rocm-smi

Bottom Line

AMD GPUs are a legitimate option for local LLM inference in 2026. The RX 7900 XTX at 24GB VRAM can run the same models as an RTX 3090, at 70-85% of the speed, often for less money. ROCm 6.x has matured significantly, and Ollama’s built-in ROCm support makes setup straightforward on supported hardware.

The tradeoff is real: NVIDIA has better software support, more community resources, and slightly higher performance. If you are comfortable with Linux and willing to troubleshoot occasional ROCm quirks, AMD is a cost-effective path to local AI. If you want the smoothest experience with the widest compatibility, NVIDIA remains the default choice.

FAQ

Can I use any AMD GPU for local LLM inference?

No. ROCm officially supports only specific AMD GPUs based on the RDNA 3 and CDNA architectures. The RX 7900 XTX, RX 7900 XT, and RX 7900 GRE are the primary consumer options. Older RDNA 2 cards like the RX 6900 XT have limited community support through environment variable overrides but are not officially supported and may have stability issues.

Is AMD GPU performance comparable to NVIDIA for LLM inference?

AMD GPUs are roughly 15-30% slower than NVIDIA equivalents at the same VRAM tier for LLM inference. The RX 7900 XTX with 24GB VRAM performs at roughly 70-85% of an RTX 3090’s inference speed depending on the model. The gap is due to less mature software optimization in ROCm compared to CUDA, not hardware limitations.

Does Ollama automatically detect AMD GPUs?

Yes, if ROCm is properly installed. Ollama checks for ROCm at startup and uses AMD GPUs when available. Run ollama run llama3.1 and check GPU utilization with rocm-smi to verify. If Ollama falls back to CPU, check that your GPU is in the supported list and ROCm drivers are correctly installed.

Can I mix AMD and NVIDIA GPUs in the same system for AI?

Not practically. ROCm and CUDA are separate software stacks, and most AI frameworks including Ollama use one or the other per process. You cannot split a model across an AMD and NVIDIA GPU. You could theoretically run separate Ollama instances on each GPU, but this adds complexity with minimal benefit.

Should I buy AMD or NVIDIA for a new local AI build?

For maximum compatibility and performance, NVIDIA is still the safer choice in 2026. CUDA has broader software support and better optimization for AI workloads. Buy AMD if the price difference is significant (the RX 7900 XTX is often $200-300 cheaper than an RTX 3090 for the same 24GB VRAM), you are comfortable with Linux troubleshooting, and you accept potentially lower inference speeds.

Running Local LLMs on AMD GPUs with ROCm and Ollama#

TL;DR#

Supported AMD GPUs#

Officially Supported (RDNA 3)#

Officially Supported (CDNA / Data Center)#

Community / Unsupported (May Work)#

Installing ROCm 6.x on Debian/Ubuntu#

Prerequisites#

Add the AMD ROCm Repository#

Install ROCm#

Add Your User to the Required Groups#

Verify the Installation#

Environment Variables#

Installing Ollama with ROCm Support#

Docker with ROCm#

Systemd Service Configuration#

Performance Benchmarks: AMD vs NVIDIA#

7B Models (Llama 3.1 8B)#

13B Models (CodeLlama 13B)#

34B Models (CodeLlama 34B, Yi 34B)#

Mixtral 8x7B (MoE)#

Known Issues and Workarounds#

Issue: Ollama Falls Back to CPU#

Issue: ROCm Not Detecting GPU#

Issue: Out of Memory on 24GB Card#

Issue: Kernel Panic or System Freeze Under Load#

Issue: Slow Performance Compared to Expected#

Cost Comparison: AMD vs NVIDIA#

24GB Tier#

16GB Tier#

When AMD Wins on Cost#

When NVIDIA Wins on Cost#

Building an AMD AI Server#

Recommended Build: RX 7900 XTX Workstation#

Installation Checklist#

Bottom Line#

FAQ#

Can I use any AMD GPU for local LLM inference?#

Is AMD GPU performance comparable to NVIDIA for LLM inference?#

Does Ollama automatically detect AMD GPUs?#

Can I mix AMD and NVIDIA GPUs in the same system for AI?#

Should I buy AMD or NVIDIA for a new local AI build?#

Related Local AI Guides

Ollama on Raspberry Pi: Running Local LLMs on ARM

TL;DR

Running Local LLMs with Ollama and llama.cpp

Running Local LLMs with Ollama and llama.cpp

TL;DR

Building Tiny LLMs Locally: A Beginner's Guide with Ollama

TL;DR

Running Local AI Models on Kubernetes with Ollama in 2026

TL;DR

Running Image Generation Models Locally with Ollama in 2026

TL;DR

Running Ollama Serve: Complete Setup Guide for Local AI

TL;DR

Running Local LLMs on AMD GPUs with ROCm and Ollama

TL;DR

Supported AMD GPUs

Officially Supported (RDNA 3)

Officially Supported (CDNA / Data Center)

Community / Unsupported (May Work)

Installing ROCm 6.x on Debian/Ubuntu

Prerequisites

Add the AMD ROCm Repository

Install ROCm

Add Your User to the Required Groups

Verify the Installation

Environment Variables

Installing Ollama with ROCm Support

Docker with ROCm

Systemd Service Configuration

Performance Benchmarks: AMD vs NVIDIA

7B Models (Llama 3.1 8B)

13B Models (CodeLlama 13B)

34B Models (CodeLlama 34B, Yi 34B)

Mixtral 8x7B (MoE)

Known Issues and Workarounds

Issue: Ollama Falls Back to CPU

Issue: ROCm Not Detecting GPU

Issue: Out of Memory on 24GB Card

Issue: Kernel Panic or System Freeze Under Load

Issue: Slow Performance Compared to Expected

Cost Comparison: AMD vs NVIDIA

24GB Tier

16GB Tier

When AMD Wins on Cost

When NVIDIA Wins on Cost

Building an AMD AI Server

Recommended Build: RX 7900 XTX Workstation

Installation Checklist

Bottom Line

FAQ

Can I use any AMD GPU for local LLM inference?

Is AMD GPU performance comparable to NVIDIA for LLM inference?

Does Ollama automatically detect AMD GPUs?

Can I mix AMD and NVIDIA GPUs in the same system for AI?

Should I buy AMD or NVIDIA for a new local AI build?