AI RESEARCH
How Optimality Structures Sparse Dictionaries: A Theory for Understanding SAE Representations
arXiv CS.LG
•
ArXi:2606.02385v1 Announce Type: cross Sparse Autoencoders (SAEs) have found success parsing neural representations into interpretable concepts, providing a basis for understanding and control. However, what exactly SAEs extract, and, correspondingly, the scientific