H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer

ArXi:2605.24930v1 Announce Type: new Transformer-based LLMs achieve strong results on many language tasks; however, long inputs remain challenging because context windows are finite, and prefill latency and memory grow rapidly with prompt length. Flat token-stream processing and chunk-based retrieval can therefore spend substantial computation and context budget on text unrelated to the query. Offline-indexed RAG additionally