HybridThinker: Efficient Chain-of-Thought Reasoning via Compressed Memory and Transient Thought Steps

ArXi:2606.03768v1 Announce Type: new Extended chain-of-thought (CoT) traces improve LLM reasoning but incur substantial computational and memory costs. While existing CoT compression methods mitigate this by condensing thought steps into compact representations via memory tokens and retaining only these representations at inference time, the loss of fine-grained information makes subsequent steps error-prone. To alleviate this, we propose \textbf{HybridThinker}, where in addition to preserved these representations, thought steps are also temporarily retained to provide fine-grained details.