Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

r/LocalLLaMA
Generative AI Open Source AI AI Research

Llama benchmark results model size params backend ngl threads type_k type_ fa test t/s qwen35moe 35B. A3B Q4_K - Medium 20.81 GiB 34.66B SYCL 99 1 q8_0 q8_0 1 pp512 977.40 ± 2.02 qwen35moe 35B. A3B Q4_K - Medium 20.81 GiB 34.66B SYCL 99 1 q8_0 q8_0 1 tg128 70.54 ± 0.12 I've chucked all my notes in an LLM and created an article if you want to recreate the same setup. I am currently using this with oh my pi and its very usable. I was able to create a well-designed poker game without it going in a loop or hanging/crashing.