AI RESEARCH

Confidence-Adaptive SwiGLU for Mixture-of-Experts

arXiv CS.LG

ArXi:2606.00761v1 Announce Type: new SwiGLU has become a standard gated activation in modern Transformer MLPs, yet its gate sharpness -- the smoothness and selectivity of the gating function -- is typically fixed throughout