AI RESEARCH
Confidence-Adaptive SwiGLU for Mixture-of-Experts
arXiv CS.LG
•
ArXi:2606.00761v1 Announce Type: new SwiGLU has become a standard gated activation in modern Transformer MLPs, yet its gate sharpness -- the smoothness and selectivity of the gating function -- is typically fixed throughout