Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory

ArXi:2605.20948v1 Announce Type: new Scaling conditional memory offers a promising way to increase language-model capacity, but existing methods such as Engram, making memory scaling expensive and sometimes ineffective. We propose Memory Grafting, a conditional memory scaling method that utilizes frozen hidden states from a grafting model as conditional n-gram memory. Given frequent local n-grams, we run the grafting model offline, final-token hidden representations as memory values, and let the recipient model retrieve them through exact longest-match suffix lookup.