Build 9254 fixes my TG regression and adds PDL for NVIDIA GPUs

I was seeing TG regression on both mtp and non models with the last few builds and had to fall back to b9202 but I just ran the new b9254 and TG has been red with a bonus 3% uplift on 2x5060ti 16gb on tensor split. I ran cmake with the PDL flag to give it a shot. I'm going to test without it soon to compare but I'm getting consistent results 3k PP & 127 tg/s on qwen3.6-35b-a3b-Q4_K_XL Conversation aendk commented 3 weeks ago Overview Programmatic Dependent Launch (PDL) is a CUDA optimization for newer NVIDIA GPUs (CC >= 90; does not include Ada.