Spectral Principal Paths: A Spectral Perspective on Linear Representation Formation in LLMs

ArXi:2506.08543v3 Announce Type: replace High-level representations have become a central focus in enhancing AI transparency and control, shifting attention from individual neurons or circuits to structured semantic directions that align with human-interpretable concepts. While the Linear Representation Hypothesis (LRH) suggests that such directions emerge in representations, it remains unclear how these representations originate and why they become increasingly stable across layers. To solve this issue, we.