Nvidia H100(94GB VRAM) - should I run llama.cpp or vllm for 30 users inference?

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Tools

I was given the great opportunity to borrow a H100 with 94GB VRAM at work until it is needed by a customer. (No idea how much system ram I will get, but I guess they are a bit flexible on this). - I want to build a inference endpoint that can handle up to 30 users. - I want a fairly reasonable big context, say 131,072-262,144. - I think in most situations, realistically speaking, not than 10-15 users will use it concurrently. - Main use for this will be tools like Pi and OpenCode.