rocmate — gfx1102 (RX 7600)

Axolotl 🟡 partial ROCm 6.2

RX 7600 (8 GB) — QLoRA of 7B is tight. Use per_device_train_batch_size: 1 and gradient_checkpointing. Offload optimizer states to CPU if OOM.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

Same install as gfx1100.
Config: micro_batch_size: 1, gradient_checkpointing: true, optimizer: adamw_8bit

ComfyUI 🟡 partial ROCm 6.2

RX 7600 — 8 GB VRAM is tight. SD 1.5 works; SDXL requires --lowvram flag and is slow.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

Same install as gfx1100.
Launch with: python main.py --lowvram --listen

ExLlamaV2 🟡 partial ROCm 6.2

RX 7600 (8 GB) — 7B EXL2 4bpw fits with room to spare. 13B Q4 is tight but possible. Monitor VRAM with rocm-smi; context length may need reducing.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
pip install exllamav2
Reduce max_seq_len if VRAM is exhausted.

llama.cpp ✅ tested ROCm 6.2

RX 7600 — 8 GB VRAM; Q4_K_M models up to 8B. Offload remaining layers to CPU with --n-gpu-layers. Vulkan build is an alternative if ROCm gives trouble.

Install hints

Same HIP build as gfx1100.
Use --n-gpu-layers 30 to partially offload larger models.
Vulkan alternative: cmake -B build -DGGML_VULKAN=ON && cmake --build build --config Release -j$(nproc)

Ollama ✅ tested ROCm 6.3

RX 7600 — works on Linux with ROCm 6.x. Lower VRAM (8 GB) limits model size; stick to ≤7B Q4.

Install hints

curl -fsSL https://ollama.com/install.sh | sh
Limit to models ≤7B Q4 due to 8 GB VRAM.

Stable Diffusion WebUI 🟡 partial ROCm 6.2

RX 7600 — 8 GB VRAM. SD 1.5 works; SDXL needs --medvram or --lowvram flag and is slow.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

Same as gfx1100.
Launch with: ./webui.sh --medvram

vLLM 🟡 partial ROCm 6.2

RX 7600 (8 GB) — vLLM's default memory pre-allocation requires lowering --gpu-memory-utilization to 0.80 or less. Stick to 7B Q4/Q8 models.

ENV vars

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm6.2
python -m vllm.entrypoints.openai.api_server --model <model> --gpu-memory-utilization 0.80