Qwen 3.6 benchmarks on 2x RTX PRO 6000

r/LocalLLaMA
Open Source AI AI Tools

Got a chance to play around with 2x RTX PRO 6000 setup so sharing some number for Qwen 3.6. All these were run using latest stable VLLM backend. This was for a personal project. Qwen 3.6 27B BF16 (Original without any quantization) MTP - Off | 64 concurrency | 1600 tps generation MTP - 2 | 32 concurrency | 1400 tps generation MTP - 2 | 64 concurrency | 1800 tps generation Qwen 3.6 35B BF16 MTP - Off | 64 concurrency | 2700 tps generation MTP - Off | 128 concurrency | 3500 tps generation (Prompt Processing 30,000 tps) submitted by /u/mxforest [link] [comments.