rocmate

gfx1102 — RX 7600

Back to matrix

Chip: gfx1102  ·  7 tool(s) with data

Axolotl 🟡 partial ROCm 6.2

RX 7600 (8 GB) — QLoRA of 7B is tight. Use per_device_train_batch_size: 1 and gradient_checkpointing. Offload optimizer states to CPU if OOM.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0
  • export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

  • Same install as gfx1100.
  • Config: micro_batch_size: 1, gradient_checkpointing: true, optimizer: adamw_8bit
ComfyUI 🟡 partial ROCm 6.2

RX 7600 — 8 GB VRAM is tight. SD 1.5 works; SDXL requires --lowvram flag and is slow.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0
  • export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True

Install hints

  • Same install as gfx1100.
  • Launch with: python main.py --lowvram --listen
ExLlamaV2 🟡 partial ROCm 6.2

RX 7600 (8 GB) — 7B EXL2 4bpw fits with room to spare. 13B Q4 is tight but possible. Monitor VRAM with rocm-smi; context length may need reducing.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

  • pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
  • pip install exllamav2
  • Reduce max_seq_len if VRAM is exhausted.
llama.cpp ✅ tested ROCm 6.2

RX 7600 — 8 GB VRAM; Q4_K_M models up to 8B. Offload remaining layers to CPU with --n-gpu-layers. Vulkan build is an alternative if ROCm gives trouble.

Install hints

  • Same HIP build as gfx1100.
  • Use --n-gpu-layers 30 to partially offload larger models.
  • Vulkan alternative: cmake -B build -DGGML_VULKAN=ON && cmake --build build --config Release -j$(nproc)
Ollama ✅ tested ROCm 6.3

RX 7600 — works on Linux with ROCm 6.x. Lower VRAM (8 GB) limits model size; stick to ≤7B Q4.

Install hints

  • curl -fsSL https://ollama.com/install.sh | sh
  • Limit to models ≤7B Q4 due to 8 GB VRAM.
Stable Diffusion WebUI 🟡 partial ROCm 6.2

RX 7600 — 8 GB VRAM. SD 1.5 works; SDXL needs --medvram or --lowvram flag and is slow.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

  • Same as gfx1100.
  • Launch with: ./webui.sh --medvram
vLLM 🟡 partial ROCm 6.2

RX 7600 (8 GB) — vLLM's default memory pre-allocation requires lowering --gpu-memory-utilization to 0.80 or less. Stick to 7B Q4/Q8 models.

ENV vars

  • export HSA_OVERRIDE_GFX_VERSION=11.0.0

Install hints

  • pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm6.2
  • python -m vllm.entrypoints.openai.api_server --model <model> --gpu-memory-utilization 0.80

No data yet for: faster-whisper