Qwen3.6-35B vs Gemma4-26B on 7900 XTX

r/LocalLLaMA
Open Source AI AI Research

Ran a fair comparison between Qwen3.6-35B-A3B and Gemma4-26B-A4B on my Radeon 7900 XTX. Both reasoning-enabled at matching 32K budgets, no output caps, six generic real-world prompts (meeting notes, incident postmortem, log triage to JSON, code review, a build-vs-buy decision, a creative prompt). TL;DR: the model with the slower decoder won the wall clock. Qwen’s MTP makes it ~1.65x faster at emitting tokens (130 vs 78 tok/s), but it generates ~2x as many tokens to answer the same prompt, most going to internal reasoning. Net result: Gemma is ~20% faster end to end.