DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

ArXi:2605.20936v1 Announce Type: new Hybrid attention architectures are becoming an increasingly important paradigm for improving LLM inference efficiency while preserving model quality, making hybrid architecture design a central problem. Existing designs often rely on manual empirical rules or proxy-based selector signals for layer-wise operator allocation. Recent NAS-style systems such as Jet-Nemotron nstrate the promise of automated hybrid architecture search.