Long-context performance at lower quants

r/LocalLLaMA
AI Safety

I've been using Qwen3.5 122B A10B (Q3_K_XL) a lot lately for coding, and it's been pretty incredible overall like it feels not far off from frontier-level for most tasks -- but I've been noticing that usually once I hit around 75-80k context use, it starts to get dumb all of a sudden. It just hits a brick wall and quality deteriorates rapidly and drastically. It'll begin hallucinating, forgetting things, or think something it said/suggested was actually something that I said. I found I have to compact before I get to that point, and then it keeps going on just fine.