ROCm doesn’t actually use your AMD GPU for inference unless you specifically tell it to, even if you have a perfectly compatible card.

Let’s get your AMD GPU humming with Ollama. This is about making sure the ROCm drivers and libraries are correctly recognized by Ollama, allowing it to offload model computations to your GPU.

Here’s how to get it working, from the most common issues to the less obvious ones:

1. ROCm Installation & Verification

The most frequent culprit is an incomplete or incorrect ROCm installation. Ollama relies on ROCm being properly set up on your system before it can even see your AMD GPU.

Diagnosis: First, confirm ROCm is installed. Open a terminal and run:

rocminfo

This command should output detailed information about your AMD GPUs, ROCm version, and driver status. If it fails, or shows no devices, ROCm isn’t installed correctly or your GPU isn’t supported by the ROCm version you have.

Fix: Follow the official AMD ROCm installation guide for your specific Linux distribution and GPU. Ensure you install the rocm-hip-sdk and rocm-opencl-sdk packages. For example, on Ubuntu:

sudo apt update
sudo apt install rocm-hip-sdk rocm-opencl-sdk

After installation, reboot your system. Then run rocminfo again. If it shows your GPU, proceed.

Why it works: Ollama looks for the ROCm libraries and executables that rocminfo verifies. A successful rocminfo output means the foundational ROCm environment is present.

2. Ollama’s ROCm Build

Ollama needs to be compiled with ROCm support enabled. If you installed Ollama via a pre-built binary, it might not have this enabled by default.

Diagnosis: Check Ollama’s build flags. Run:

ollama --version

Look for rocm in the output. If it’s not there, your Ollama binary doesn’t have ROCm support.

Fix: The most reliable way is to build Ollama from source with ROCm enabled.

  1. Clone the Ollama repository:
    git clone https://github.com/ollama/ollama.git
    cd ollama
    
  2. Build it:
    make LLAMA_HIPBLAS=1
    
    This flag explicitly tells the build process to include ROCm (HIPBLAS) support.
  3. Install the built binary:
    sudo make install
    
    This usually places the binary in /usr/local/bin/ollama. You might need to update your PATH or remove any existing system-installed Ollama.

Why it works: The LLAMA_HIPBLAS=1 flag links Ollama’s inference engine against the ROCm HIP libraries, enabling GPU acceleration.

3. User Permissions for ROCm

ROCm requires specific user group memberships for access to GPU devices. If your user isn’t in the correct groups, Ollama won’t be able to access the GPU even if ROCm is installed.

Diagnosis: Check your user’s group memberships:

groups $USER

You should see render and video (or sometimes kfd depending on ROCm version and distribution) in the output.

Fix: Add your user to the necessary groups. Replace your_username with your actual username.

sudo usermod -aG render $USER
sudo usermod -aG video $USER
# If 'kfd' is required by your ROCm version:
# sudo usermod -aG kfd $USER

Crucially, log out and log back in for these group changes to take effect.

Why it works: These groups grant the user process (Ollama) the necessary permissions to interact with the GPU hardware through the ROCm drivers.

4. Ollama Environment Variables

Ollama can be influenced by environment variables, especially for GPU acceleration. Sometimes, these can be misconfigured or unset, preventing ROCm detection.

Diagnosis: Check relevant environment variables:

echo $OLLAMA_HIPBLAS
echo $HIP_VISIBLE_DEVICES

If OLLAMA_HIPBLAS is not 1, or HIP_VISIBLE_DEVICES is set to an invalid value (or not set when you want to use the GPU), it can cause issues.

Fix: Ensure OLLAMA_HIPBLAS is set to 1 when you run Ollama. You can set this globally in your shell profile (~/.bashrc, ~/.zshrc) or run Ollama with it set:

export OLLAMA_HIPBLAS=1
ollama run llama3

If you need to select a specific GPU, you can use HIP_VISIBLE_DEVICES. For example, to use the first detected GPU:

export HIP_VISIBLE_DEVICES=0
export OLLAMA_HIPBLAS=1
ollama run llama3

Why it works: OLLAMA_HIPBLAS=1 tells the Ollama binary to attempt ROCm acceleration. HIP_VISIBLE_DEVICES filters which ROCm-capable devices are made available to HIP applications.

5. ROCm Driver Version Compatibility

The ROCm driver version installed on your system must be compatible with the version of Ollama you are using and the specific GPU hardware. Older or newer drivers can sometimes cause subtle issues.

Diagnosis: Check your installed ROCm version:

dpkg -l | grep rocm
# or
rpm -qa | grep rocm

Compare this to the ROCm version matrix on AMD’s website to ensure it’s supported for your GPU.

Fix: If your ROCm version is not supported, you may need to:

  • Downgrade/Upgrade ROCm: Use your distribution’s package manager or AMD’s official repositories to install a compatible ROCm version. This can be complex and may require removing existing ROCm packages first.
  • Build Ollama against a specific ROCm version: If you’re building Ollama from source, ensure your build environment has the correct ROCm SDK linked.

Why it works: ROCm is a complex software stack. Compatibility between the driver, the HIP runtime, and the application (Ollama) is critical for correct operation.

6. Ollama Model Configuration

While less common, some models might have specific configurations or be downloaded in a format that doesn’t optimally leverage GPU acceleration out-of-the-box.

Diagnosis: Try running a known small, fast model like llama3:8b and observe Ollama’s output or system resource usage. If the GPU isn’t being utilized, it’s likely a system-level configuration issue.

Fix: Ensure you are pulling and running models correctly. For example:

ollama pull llama3:8b
ollama run llama3:8b

If you suspect a model-specific issue, you can try re-pulling the model or checking the Ollama community forums for known issues with specific models and ROCm.

Why it works: Ollama’s runtime dynamically loads model weights and structures. Correct model loading ensures the inference engine can map operations to the available HIPBLAS routines on the GPU.

After ensuring these steps, when you run ollama run <model_name>, you should see output indicating GPU utilization, and watch -n 1 nvidia-smi (even though it’s AMD, some tools can still monitor it, or use radeontop or nvtop if it supports AMD) will show activity on your GPU.

The next thing you’ll likely run into is understanding how to tune Ollama for different model sizes and hardware capabilities, which involves exploring parameters like num_gpu and num_ctx.

Want structured learning?

Take the full Ollama course →