110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp

Had been getting great MTP performance with llama.cpp on my RTX 4070 Super 12GB, until they actually merged the MTP PR. Then, performance tanked and was barely above non-MTP. So, I decided to try out ik_llama.cpp since it also s MTP and is apparently better optimized for CPU offloading. I did not expect such a huge speed boost! Before moving on with the benchmark results, here's my PC specs: OS: CachyOS (HIGHLY recommended) GPU: RTX 4070 Super 12GB CPU: AMD Ryzen 7 9700X RAM: 48GB DDR5-6000