Is there a definitive way or cookie cutter way to benchmark variations of the same model for their KLD?

r/LocalLLaMA
AI Research

I'm looking to do some comparisons between different Qwopus3.6-27B-v2-NVFP4 models, namely A, B, and C. In particular, I would like to measure how much their KLD is. How do I go about doing this? submitted by /u/jinnyjuice [link] [comments]