AI RESEARCH
PowLU: An Activation Function for Stable Pre-Training of LLMs
arXiv CS.LG
•
ArXi:2605.25704v1 Announce Type: cross In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and