ROCm doesn’t actually use your AMD GPU for inference unless you specifically tell it to, even if you have a perfectly compatible card.
Let’s get your AMD GPU humming with Ollama. This is about making sure the ROCm drivers and libraries are correctly recognized by Ollama, allowing it to offload model computations to your GPU.
Here’s how to get it working, from the most common issues to the less obvious ones:
1. ROCm Installation & Verification
The most frequent culprit is an incomplete or incorrect ROCm installation. Ollama relies on ROCm being properly set up on your system before it can even see your AMD GPU.
Diagnosis: First, confirm ROCm is installed. Open a terminal and run:
rocminfo
This command should output detailed information about your AMD GPUs, ROCm version, and driver status. If it fails, or shows no devices, ROCm isn’t installed correctly or your GPU isn’t supported by the ROCm version you have.
Fix:
Follow the official AMD ROCm installation guide for your specific Linux distribution and GPU. Ensure you install the rocm-hip-sdk and rocm-opencl-sdk packages. For example, on Ubuntu:
sudo apt update
sudo apt install rocm-hip-sdk rocm-opencl-sdk
After installation, reboot your system. Then run rocminfo again. If it shows your GPU, proceed.
Why it works: Ollama looks for the ROCm libraries and executables that rocminfo verifies. A successful rocminfo output means the foundational ROCm environment is present.
2. Ollama’s ROCm Build
Ollama needs to be compiled with ROCm support enabled. If you installed Ollama via a pre-built binary, it might not have this enabled by default.
Diagnosis: Check Ollama’s build flags. Run:
ollama --version
Look for rocm in the output. If it’s not there, your Ollama binary doesn’t have ROCm support.
Fix: The most reliable way is to build Ollama from source with ROCm enabled.
- Clone the Ollama repository:
git clone https://github.com/ollama/ollama.git cd ollama - Build it:
This flag explicitly tells the build process to include ROCm (HIPBLAS) support.make LLAMA_HIPBLAS=1 - Install the built binary:
This usually places the binary insudo make install/usr/local/bin/ollama. You might need to update your PATH or remove any existing system-installed Ollama.
Why it works: The LLAMA_HIPBLAS=1 flag links Ollama’s inference engine against the ROCm HIP libraries, enabling GPU acceleration.
3. User Permissions for ROCm
ROCm requires specific user group memberships for access to GPU devices. If your user isn’t in the correct groups, Ollama won’t be able to access the GPU even if ROCm is installed.
Diagnosis: Check your user’s group memberships:
groups $USER
You should see render and video (or sometimes kfd depending on ROCm version and distribution) in the output.
Fix:
Add your user to the necessary groups. Replace your_username with your actual username.
sudo usermod -aG render $USER
sudo usermod -aG video $USER
# If 'kfd' is required by your ROCm version:
# sudo usermod -aG kfd $USER
Crucially, log out and log back in for these group changes to take effect.
Why it works: These groups grant the user process (Ollama) the necessary permissions to interact with the GPU hardware through the ROCm drivers.
4. Ollama Environment Variables
Ollama can be influenced by environment variables, especially for GPU acceleration. Sometimes, these can be misconfigured or unset, preventing ROCm detection.
Diagnosis: Check relevant environment variables:
echo $OLLAMA_HIPBLAS
echo $HIP_VISIBLE_DEVICES
If OLLAMA_HIPBLAS is not 1, or HIP_VISIBLE_DEVICES is set to an invalid value (or not set when you want to use the GPU), it can cause issues.
Fix:
Ensure OLLAMA_HIPBLAS is set to 1 when you run Ollama. You can set this globally in your shell profile (~/.bashrc, ~/.zshrc) or run Ollama with it set:
export OLLAMA_HIPBLAS=1
ollama run llama3
If you need to select a specific GPU, you can use HIP_VISIBLE_DEVICES. For example, to use the first detected GPU:
export HIP_VISIBLE_DEVICES=0
export OLLAMA_HIPBLAS=1
ollama run llama3
Why it works: OLLAMA_HIPBLAS=1 tells the Ollama binary to attempt ROCm acceleration. HIP_VISIBLE_DEVICES filters which ROCm-capable devices are made available to HIP applications.
5. ROCm Driver Version Compatibility
The ROCm driver version installed on your system must be compatible with the version of Ollama you are using and the specific GPU hardware. Older or newer drivers can sometimes cause subtle issues.
Diagnosis: Check your installed ROCm version:
dpkg -l | grep rocm
# or
rpm -qa | grep rocm
Compare this to the ROCm version matrix on AMD’s website to ensure it’s supported for your GPU.
Fix: If your ROCm version is not supported, you may need to:
- Downgrade/Upgrade ROCm: Use your distribution’s package manager or AMD’s official repositories to install a compatible ROCm version. This can be complex and may require removing existing ROCm packages first.
- Build Ollama against a specific ROCm version: If you’re building Ollama from source, ensure your build environment has the correct ROCm SDK linked.
Why it works: ROCm is a complex software stack. Compatibility between the driver, the HIP runtime, and the application (Ollama) is critical for correct operation.
6. Ollama Model Configuration
While less common, some models might have specific configurations or be downloaded in a format that doesn’t optimally leverage GPU acceleration out-of-the-box.
Diagnosis:
Try running a known small, fast model like llama3:8b and observe Ollama’s output or system resource usage. If the GPU isn’t being utilized, it’s likely a system-level configuration issue.
Fix: Ensure you are pulling and running models correctly. For example:
ollama pull llama3:8b
ollama run llama3:8b
If you suspect a model-specific issue, you can try re-pulling the model or checking the Ollama community forums for known issues with specific models and ROCm.
Why it works: Ollama’s runtime dynamically loads model weights and structures. Correct model loading ensures the inference engine can map operations to the available HIPBLAS routines on the GPU.
After ensuring these steps, when you run ollama run <model_name>, you should see output indicating GPU utilization, and watch -n 1 nvidia-smi (even though it’s AMD, some tools can still monitor it, or use radeontop or nvtop if it supports AMD) will show activity on your GPU.
The next thing you’ll likely run into is understanding how to tune Ollama for different model sizes and hardware capabilities, which involves exploring parameters like num_gpu and num_ctx.