Quick note on sudden performance loss when running GGUFs
r/LocalLLaMA
•
Generative AI
AI Research
Had a couple of GGUFs (Qwen3.5-35B-A3B-APEX-I-Quality and an Unsloth model as well) that suddenly displayed erratic performance characteristics (sudden deep dives from 20+ tg/s down to 5 tg/s), turned out both had been damaged, not unlikely during manual embedding of MTP layers (shouldn't touch the source model from logic po.). Discovered by using sha256 sum and seeing that things weren't aligned any longer, redownloaded models and all sorted. TLDR: check sha256sum of model matches correctly if things get iffy. submitted by /u/yeah-ok [link] [comments.