Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)
r/LocalLLaMA
•
Open Source AI
AI Research
AI Tools
Okay fun time I got access to two Nvlinked A100s for some research project I benchmarked my work against the Gemma 4 31b-it available through Google, but my dataset is rather massive, so I need to run it on the "local" resources. Basically I use vLLM to run the model liteLLM to proxy to it and some python code to then talk with it. I use the structured output option for my analytics. But what ever I try the output is just bad.