rocmate

gfx1100 — RX 7900 XT/XTX

Back to matrix

Chip: gfx1100  ·  8 tool(s) with data

Axolotl ✅ tested ROCm 6.2

RX 7900 XTX (24 GB) handles QLoRA fine-tuning of 7B–13B models comfortably. Flash-attention 2 works via ROCm CK (install separately). bitsandbytes ROCm fork required for quantized training.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0
  • export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

  • git clone https://github.com/axolotl-ai-cloud/axolotl && cd axolotl
  • pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
  • pip install packaging ninja && pip install flash-attn --no-build-isolation
  • pip install -e '.[deepspeed]'
  • accelerate launch -m axolotl.cli.train examples/llama-3/qlora.yml
ComfyUI ✅ tested ROCm 6.2

Works well on RX 7900 XTX with PyTorch ROCm 6.2+. SDXL runs comfortably in 24 GB VRAM. Flux.1 also works but requires careful memory management.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0
  • export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

  • Linux: git clone https://github.com/comfyanonymous/ComfyUI && cd ComfyUI
  • python -m venv venv && source venv/bin/activate
  • pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2
  • pip install -r requirements.txt && python main.py --listen
  • Windows (HIP SDK): pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2
ExLlamaV2 ✅ tested ROCm 6.2

RX 7900 XTX — excellent performance. ExLlamaV2 is one of the fastest GPTQ/EXL2 backends on AMD. Mistral 7B EXL2 4bpw runs at ~80 tok/s. 24 GB allows 34B Q4.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

  • pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
  • pip install exllamav2
  • python test_inference.py -m /path/to/model -p "Hello world"
  • Or build from source for latest features: git clone https://github.com/turboderp/exllamav2 && pip install -e .
  • cmake -DCMAKE_BUILD_TYPE=Release . && cmake --build . --target exl2 --config Release
faster-whisper ✅ tested ROCm 6.2

faster-whisper itself targets CUDA; on AMD use the openai-whisper or whisperX route with PyTorch + ROCm, or run faster-whisper on CPU with int8 quantization (still fast for short clips). For GPU on AMD, use openai-whisper-rocm fork.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

  • pip install torch torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
  • Verify torch.cuda.is_available() returns True (yes, 'cuda' under ROCm)
  • For pure faster-whisper: use device='cpu', compute_type='int8' as fallback
llama.cpp ✅ tested ROCm 6.2

Compile with GGML_HIP=ON. Runs well on RX 7900 XTX; Q4_K_M models up to 70B fit in 24 GB. Pre-built HIP binaries available in GitHub releases. Vulkan build also works if you prefer to avoid ROCm: cmake -B build -DGGML_VULKAN=ON.

Install hints

  • Linux: git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp
  • HIP (ROCm): cmake -B build -DGGML_HIP=ON && cmake --build build --config Release -j$(nproc)
  • Vulkan (no ROCm needed): cmake -B build -DGGML_VULKAN=ON && cmake --build build --config Release -j$(nproc)
  • Windows: download pre-built HIP binary from GitHub Releases (look for 'hip' in filename)
  • Verify GPU is used: ./build/bin/llama-cli -m model.gguf -p 'hello' --n-gpu-layers 99
Ollama ✅ tested ROCm 6.3

Works out of the box on Linux with ROCm 6.x. Tested on RX 7900 XTX (24 GB) running Qwen 2.5 14B and Llama 3.1 8B.

Install hints

  • Linux: curl -fsSL https://ollama.com/install.sh | sh
  • Windows: download the Ollama installer from https://ollama.com/download/windows (ships HIP libs)
  • Verify with: ollama run llama3.1:8b (should hit GPU, not CPU)
  • Watch GPU usage live: watch -n 1 rocm-smi
Stable Diffusion WebUI ✅ tested ROCm 6.2

Works on RX 7900 XTX with PyTorch ROCm wheels. SDXL and SD 1.5 run well. Flux.1 requires additional setup (install flux dependencies separately).

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0
  • export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

  • git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui && cd stable-diffusion-webui
  • Set TORCH_COMMAND before launch: export TORCH_COMMAND='pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2'
  • Linux launch: ./webui.sh
  • Windows: set TORCH_COMMAND=pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2 && webui-user.bat
vLLM ✅ tested ROCm 6.2

RX 7900 XTX works well. vLLM pre-allocates GPU memory (90 % by default) so 24 GB lets you run 13–34B models. Use --gpu-memory-utilization to tune. Flash-attention is supported via ROCm's CK library.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

  • python -m venv venv && source venv/bin/activate
  • pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm6.2
  • python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.1-8B-Instruct
  • Verify with: curl http://localhost:8000/v1/models