rocmate — gfx1100 (RX 7900 XT/XTX)

Chip: gfx1100 · 8 tool(s) with data

Axolotl ✅ tested ROCm 6.2

RX 7900 XTX (24 GB) handles QLoRA fine-tuning of 7B–13B models comfortably. Flash-attention 2 works via ROCm CK (install separately). bitsandbytes ROCm fork required for quantized training.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

git clone https://github.com/axolotl-ai-cloud/axolotl && cd axolotl
pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
pip install packaging ninja && pip install flash-attn --no-build-isolation
pip install -e '.[deepspeed]'
accelerate launch -m axolotl.cli.train examples/llama-3/qlora.yml

ComfyUI ✅ tested ROCm 6.2

Works well on RX 7900 XTX with PyTorch ROCm 6.2+. SDXL runs comfortably in 24 GB VRAM. Flux.1 also works but requires careful memory management.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

Linux: git clone https://github.com/comfyanonymous/ComfyUI && cd ComfyUI
python -m venv venv && source venv/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2
pip install -r requirements.txt && python main.py --listen
Windows (HIP SDK): pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2

ExLlamaV2 ✅ tested ROCm 6.2

RX 7900 XTX — excellent performance. ExLlamaV2 is one of the fastest GPTQ/EXL2 backends on AMD. Mistral 7B EXL2 4bpw runs at ~80 tok/s. 24 GB allows 34B Q4.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
pip install exllamav2
python test_inference.py -m /path/to/model -p "Hello world"
Or build from source for latest features: git clone https://github.com/turboderp/exllamav2 && pip install -e .
cmake -DCMAKE_BUILD_TYPE=Release . && cmake --build . --target exl2 --config Release

faster-whisper ✅ tested ROCm 6.2

faster-whisper itself targets CUDA; on AMD use the openai-whisper or whisperX route with PyTorch + ROCm, or run faster-whisper on CPU with int8 quantization (still fast for short clips). For GPU on AMD, use openai-whisper-rocm fork.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

pip install torch torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
Verify torch.cuda.is_available() returns True (yes, 'cuda' under ROCm)
For pure faster-whisper: use device='cpu', compute_type='int8' as fallback

llama.cpp ✅ tested ROCm 6.2

Compile with GGML_HIP=ON. Runs well on RX 7900 XTX; Q4_K_M models up to 70B fit in 24 GB. Pre-built HIP binaries available in GitHub releases. Vulkan build also works if you prefer to avoid ROCm: cmake -B build -DGGML_VULKAN=ON.

Install hints

Linux: git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp
HIP (ROCm): cmake -B build -DGGML_HIP=ON && cmake --build build --config Release -j$(nproc)
Vulkan (no ROCm needed): cmake -B build -DGGML_VULKAN=ON && cmake --build build --config Release -j$(nproc)
Windows: download pre-built HIP binary from GitHub Releases (look for 'hip' in filename)
Verify GPU is used: ./build/bin/llama-cli -m model.gguf -p 'hello' --n-gpu-layers 99

Ollama ✅ tested ROCm 6.3

Works out of the box on Linux with ROCm 6.x. Tested on RX 7900 XTX (24 GB) running Qwen 2.5 14B and Llama 3.1 8B.

Install hints

Linux: curl -fsSL https://ollama.com/install.sh | sh
Windows: download the Ollama installer from https://ollama.com/download/windows (ships HIP libs)
Verify with: ollama run llama3.1:8b (should hit GPU, not CPU)
Watch GPU usage live: watch -n 1 rocm-smi

Stable Diffusion WebUI ✅ tested ROCm 6.2

Works on RX 7900 XTX with PyTorch ROCm wheels. SDXL and SD 1.5 run well. Flux.1 requires additional setup (install flux dependencies separately).

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui && cd stable-diffusion-webui
Set TORCH_COMMAND before launch: export TORCH_COMMAND='pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2'
Linux launch: ./webui.sh
Windows: set TORCH_COMMAND=pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2 && webui-user.bat

vLLM ✅ tested ROCm 6.2

RX 7900 XTX works well. vLLM pre-allocates GPU memory (90 % by default) so 24 GB lets you run 13–34B models. Use --gpu-memory-utilization to tune. Flash-attention is supported via ROCm's CK library.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

python -m venv venv && source venv/bin/activate
pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm6.2
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.1-8B-Instruct
Verify with: curl http://localhost:8000/v1/models