How Optimality Structures Sparse Dictionaries: A Theory for Understanding SAE Representations

ArXi:2606.02385v1 Announce Type: cross Sparse Autoencoders (SAEs) have found success parsing neural representations into interpretable concepts, providing a basis for understanding and control. However, what exactly SAEs extract, and, correspondingly, the scientific