1000 tps generation on Qwen3.6 27B with V100s

r/LocalLLaMA
Generative AI

I wanted to see what the absolute best case scenario for generation on this setup was and was not disappointed. 128 concurrent requests is so far removed from what I need but it’s funny to see big number. For single user (batch 1 not 128) the generation is around 80t/s with 3000 t/s processing,no mtp! submitted by /u/Simple_Library_2700 [link] [comments]