Running Local LLMs on AMD GPUs with ROCm and Ollama
TL;DR
AMD GPUs are a viable alternative to NVIDIA for local LLM inference, particularly the RX 7900 XTX with 24GB VRAM. ROCm 6.x on Linux provides the software stack needed to run Ollama and llama.cpp with GPU acceleration. Performance is 15-30% lower than equivalent NVIDIA hardware, but AMD cards often cost significantly less.
Key facts:
- Best AMD card for local AI: RX 7900 XTX (24GB VRAM, ~$650-800 used)
- Software required: ROCm 6.x on Ubuntu 22.04/24.04 or Debian 12
- Performance: ~70-85% of RTX 3090 for LLM inference
- Supported models: Same as NVIDIA — any model that fits in VRAM
- Main tradeoff: Lower price but less mature software ecosystem
Cost comparison at the 24GB tier:
| GPU | VRAM | Used Price | Inference Speed (8B Q4) |
|---|---|---|---|
| RX 7900 XTX | 24GB | $650-800 | ~38 tok/s |
| RTX 3090 | 24GB | $700-900 | ~50 tok/s |
| RTX 4090 | 24GB | $1,500-1,800 | ~80 tok/s |
Supported AMD GPUs
ROCm 6.x officially supports a specific set of AMD GPUs. Not every AMD card works.
Officially Supported (RDNA 3)
| GPU | VRAM | ROCm Support | AI Suitability |
|---|---|---|---|
| RX 7900 XTX | 24GB | Full | Best consumer AMD option |
| RX 7900 XT | 20GB | Full | Good, but 20GB is an awkward tier |
| RX 7900 GRE | 16GB | Full | Budget option for 7B-13B models |
| RX 7800 XT | 16GB | Partial | Works with HSA_OVERRIDE |
| RX 7700 XT | 12GB | Partial | Limited to smaller models |
Officially Supported (CDNA / Data Center)
| GPU | VRAM | ROCm Support | Notes |
|---|---|---|---|
| MI250X | 128GB HBM2e | Full | Data center, expensive |
| MI210 | 64GB HBM2e | Full | Data center |
| MI100 | 32GB HBM2 | Full | Older, available used |
Community / Unsupported (May Work)
| GPU | VRAM | Status |
|---|---|---|
| RX 6900 XT | 16GB | Works with HSA_OVERRIDE_GFX_VERSION=10.3.0 |
| RX 6800 XT | 16GB | Works with override, some instability |
| RX 6700 XT | 12GB | Partial, known issues |
Cards older than RDNA 2 (RX 5000 series and earlier) are not supported and will not work with ROCm.
Installing ROCm 6.x on Debian/Ubuntu
Prerequisites
Update your system and install kernel headers:
sudo apt update && sudo apt upgrade -y
sudo apt install -y linux-headers-$(uname -r) wget gnupg2
Add the AMD ROCm Repository
For Ubuntu 22.04:
# Add the ROCm GPG key
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
# Add the repository
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ jammy main" | \
sudo tee /etc/apt/sources.list.d/rocm.list
# Pin ROCm packages to the AMD repository
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
sudo tee /etc/apt/preferences.d/rocm-pin-600
For Ubuntu 24.04:
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ noble main" | \
sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
sudo tee /etc/apt/preferences.d/rocm-pin-600
For Debian 12:
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.3/ bookworm main" | \
sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | \
sudo tee /etc/apt/preferences.d/rocm-pin-600
Install ROCm
sudo apt update
sudo apt install -y rocm-hip-runtime rocm-hip-sdk rocm-opencl-runtime
This installs the HIP runtime, development SDK, and OpenCL support. The full rocm meta-package is larger and includes tools you may not need.
Add Your User to the Required Groups
sudo usermod -aG render,video $USER
Log out and back in for group changes to take effect.
Verify the Installation
# Check ROCm version
cat /opt/rocm/.info/version
# List detected GPUs
rocm-smi
# Verify HIP runtime
/opt/rocm/bin/hipconfig --full
rocm-smi should display your GPU with temperature, utilization, and memory information. If it shows no devices, your GPU may not be supported or the driver installation failed.
Environment Variables
Add ROCm to your PATH:
echo 'export PATH=$PATH:/opt/rocm/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib' >> ~/.bashrc
source ~/.bashrc
For unsupported GPUs (RDNA 2), set the GFX version override:
# Only for RX 6900 XT, 6800 XT, etc. — NOT needed for RX 7900 series
export HSA_OVERRIDE_GFX_VERSION=10.3.0
To make the override permanent for Ollama via systemd, create an override file:
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
EOF
sudo systemctl daemon-reload
Installing Ollama with ROCm Support
Ollama ships with ROCm support built in. The standard installation script detects AMD GPUs automatically:
curl -fsSL https://ollama.ai/install.sh | sh
After installation, verify Ollama detects your AMD GPU:
ollama run llama3.1:8b "Hello, what GPU am I running on?"
While the model runs, check GPU utilization in another terminal:
rocm-smi
You should see non-zero GPU utilization and VRAM usage. If VRAM shows 0MB used while Ollama is running a model, it is falling back to CPU.
Docker with ROCm
To run Ollama in Docker with AMD GPU support:
docker run -d \
--name ollama \
--device /dev/kfd \
--device /dev/dri \
--group-add video \
--group-add render \
-v ollama-data:/root/.ollama \
-p 11434:11434 \
ollama/ollama:rocm
Note the :rocm tag. The default Ollama Docker image includes only CUDA support. The ROCm variant is a separate image.
Systemd Service Configuration
If you installed Ollama via the install script, it creates a systemd service automatically. To customize it for AMD:
sudo systemctl edit ollama
Add environment variables as needed:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_MODELS=/mnt/storage/ollama-models"
# Uncomment for RDNA 2 GPUs:
# Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Performance Benchmarks: AMD vs NVIDIA
All benchmarks run with Ollama, Q4_K_M quantization, default context length, single-user inference.
7B Models (Llama 3.1 8B)
RX 7900 XTX RTX 3090 RTX 4090
Prompt eval: ~600 tok/s ~800 tok/s ~1,400 tok/s
Token generation: ~38 tok/s ~50 tok/s ~80 tok/s
Time to first token: ~200ms ~150ms ~90ms
VRAM used: ~5GB ~5GB ~5GB
The RX 7900 XTX reaches about 76% of the RTX 3090’s generation speed. Perfectly usable for interactive chat.
13B Models (CodeLlama 13B)
RX 7900 XTX RTX 3090 RTX 4090
Prompt eval: ~350 tok/s ~500 tok/s ~900 tok/s
Token generation: ~20 tok/s ~25 tok/s ~42 tok/s
Time to first token: ~350ms ~250ms ~140ms
VRAM used: ~8GB ~8GB ~8GB
At 13B, the AMD card is about 80% of the 3090’s speed. The gap narrows slightly at larger model sizes because memory bandwidth becomes more dominant.
34B Models (CodeLlama 34B, Yi 34B)
RX 7900 XTX RTX 3090 RTX 4090
Prompt eval: ~180 tok/s ~250 tok/s ~450 tok/s
Token generation: ~12 tok/s ~15 tok/s ~24 tok/s
Time to first token: ~550ms ~400ms ~250ms
VRAM used: ~20GB ~20GB ~20GB
34B models fit in 24GB with Q4 quantization on both the XTX and 3090. At this size, the performance gap is roughly 20%.
Mixtral 8x7B (MoE)
RX 7900 XTX RTX 3090
Prompt eval: ~250 tok/s ~350 tok/s
Token generation: ~18 tok/s ~22 tok/s
VRAM used: ~26GB (tight) ~26GB (tight)
Mixtral barely fits in 24GB with Q4 quantization. Performance is similar between AMD and NVIDIA at this memory-constrained level.
Known Issues and Workarounds
Issue: Ollama Falls Back to CPU
Symptom: rocm-smi shows 0% GPU utilization while Ollama runs a model.
Fix: Ensure your user is in the render and video groups:
groups $USER
# Should include: render video
If not:
sudo usermod -aG render,video $USER
# Log out and back in
Issue: ROCm Not Detecting GPU
Symptom: rocm-smi shows no devices.
Fix: Check if the amdgpu kernel module is loaded:
lsmod | grep amdgpu
If not present, check dmesg for driver errors:
dmesg | grep -i amdgpu
Common cause: secure boot is enabled and the unsigned amdgpu module cannot load. Either disable secure boot in BIOS or sign the kernel module.
Issue: Out of Memory on 24GB Card
Symptom: Model fails to load with memory allocation error.
Fix: ROCm reserves some VRAM for system use. Available VRAM is typically 22-23GB out of 24GB. If your model needs exactly 24GB, try a smaller quantization:
# Use Q3_K_M instead of Q4_K_M to save ~2GB
ollama run codellama:34b-instruct-q3_K_M
Issue: Kernel Panic or System Freeze Under Load
Symptom: System becomes unresponsive during sustained GPU workload.
Fix: This is often a power delivery issue. AMD GPUs can spike above their rated TDP briefly. Ensure your PSU has sufficient headroom (750W+ for a single 7900 XTX system).
You can also cap power consumption:
# Limit GPU to 250W (default is 300W for 7900 XTX)
rocm-smi --setpoweroverdrive 250
Issue: Slow Performance Compared to Expected
Symptom: Inference is significantly slower than benchmarks suggest.
Fix: Check that the GPU is not thermal throttling:
watch -n 1 rocm-smi
Junction temperature should stay below 100C. If it exceeds that, improve case airflow or increase fan speed:
# Set fan speed to 80%
rocm-smi --setfan 80
Cost Comparison: AMD vs NVIDIA
The primary reason to consider AMD is cost savings at the same VRAM tier.
24GB Tier
| RX 7900 XTX | RTX 3090 | RTX 4090 | |
|---|---|---|---|
| Used price | $650-800 | $700-900 | $1,500-1,800 |
| VRAM | 24GB | 24GB | 24GB |
| 8B inference | 38 tok/s | 50 tok/s | 80 tok/s |
| Price per tok/s | $18-21 | $14-18 | $19-23 |
| Power draw | 300W | 350W | 450W |
At the 24GB tier, the RTX 3090 actually offers better value per token/second than either the XTX or 4090. The AMD card is cheaper upfront but slower.
16GB Tier
| RX 7900 GRE | RX 7800 XT | RTX 4060 Ti 16GB | |
|---|---|---|---|
| Used price | $400-500 | $350-450 | $350-450 |
| VRAM | 16GB | 16GB | 16GB |
| 8B inference | ~30 tok/s | ~25 tok/s | ~35 tok/s |
At 16GB, AMD and NVIDIA are closer in both price and performance. The RTX 4060 Ti 16GB slightly edges out the AMD options.
When AMD Wins on Cost
- You can find an RX 7900 XTX under $700. At that price, it undercuts most 3090 listings while matching VRAM.
- You already own an AMD GPU. Upgrading within the AMD ecosystem avoids buying new hardware entirely.
- You want new hardware with warranty. New 7900 XTX cards are still available at retail, while 3090s are used-only.
When NVIDIA Wins on Cost
- You value software compatibility. Every AI framework supports CUDA. ROCm support is growing but not universal.
- You plan to use Docker. NVIDIA Container Toolkit is more mature than AMD’s Docker GPU passthrough.
- You want multi-GPU. Tensor parallelism across AMD GPUs is less tested and documented than NVIDIA.
Building an AMD AI Server
Recommended Build: RX 7900 XTX Workstation
| Component | Recommendation | Price |
|---|---|---|
| GPU | RX 7900 XTX (24GB) | $700 |
| CPU | AMD Ryzen 5 5600 | $130 |
| Motherboard | B550 ATX | $100 |
| RAM | 64GB DDR4-3200 | $120 |
| PSU | 850W Gold | $100 |
| Storage | 1TB NVMe SSD | $70 |
| Case | Mid-tower with good airflow | $80 |
| Total | $1,300 |
This build gives you 24GB VRAM for running models up to 34B quantized, with enough system RAM for comfortable operation.
Installation Checklist
- Install Debian 12 or Ubuntu 22.04/24.04
- Update kernel and install headers
- Install ROCm 6.x from AMD repository
- Add user to
renderandvideogroups - Verify GPU detection with
rocm-smi - Install Ollama (auto-detects ROCm)
- Pull a model and test:
ollama run llama3.1:8b - Verify GPU utilization with
rocm-smi
Bottom Line
AMD GPUs are a legitimate option for local LLM inference in 2026. The RX 7900 XTX at 24GB VRAM can run the same models as an RTX 3090, at 70-85% of the speed, often for less money. ROCm 6.x has matured significantly, and Ollama’s built-in ROCm support makes setup straightforward on supported hardware.
The tradeoff is real: NVIDIA has better software support, more community resources, and slightly higher performance. If you are comfortable with Linux and willing to troubleshoot occasional ROCm quirks, AMD is a cost-effective path to local AI. If you want the smoothest experience with the widest compatibility, NVIDIA remains the default choice.
FAQ
Can I use any AMD GPU for local LLM inference?
No. ROCm officially supports only specific AMD GPUs based on the RDNA 3 and CDNA architectures. The RX 7900 XTX, RX 7900 XT, and RX 7900 GRE are the primary consumer options. Older RDNA 2 cards like the RX 6900 XT have limited community support through environment variable overrides but are not officially supported and may have stability issues.
Is AMD GPU performance comparable to NVIDIA for LLM inference?
AMD GPUs are roughly 15-30% slower than NVIDIA equivalents at the same VRAM tier for LLM inference. The RX 7900 XTX with 24GB VRAM performs at roughly 70-85% of an RTX 3090’s inference speed depending on the model. The gap is due to less mature software optimization in ROCm compared to CUDA, not hardware limitations.
Does Ollama automatically detect AMD GPUs?
Yes, if ROCm is properly installed. Ollama checks for ROCm at startup and uses AMD GPUs when available. Run ollama run llama3.1 and check GPU utilization with rocm-smi to verify. If Ollama falls back to CPU, check that your GPU is in the supported list and ROCm drivers are correctly installed.
Can I mix AMD and NVIDIA GPUs in the same system for AI?
Not practically. ROCm and CUDA are separate software stacks, and most AI frameworks including Ollama use one or the other per process. You cannot split a model across an AMD and NVIDIA GPU. You could theoretically run separate Ollama instances on each GPU, but this adds complexity with minimal benefit.
Should I buy AMD or NVIDIA for a new local AI build?
For maximum compatibility and performance, NVIDIA is still the safer choice in 2026. CUDA has broader software support and better optimization for AI workloads. Buy AMD if the price difference is significant (the RX 7900 XTX is often $200-300 cheaper than an RTX 3090 for the same 24GB VRAM), you are comfortable with Linux troubleshooting, and you accept potentially lower inference speeds.
