Step-Level Sparse Autoencoder for Reasoning Process Interpretation

ArXi:2603.03031v2 Announce Type: replace Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While Sparse Autoencoders (SAEs) have emerged as a powerful tool for interpretability, existing approaches predominantly operate at the token level, creating a granularity mismatch when capturing critical step-level information, such as reasoning direction and semantic transitions.