Finally pioneering beyond the local 256k context window frontier!

r/LocalLLaMA
Open Source AI

The autocompact at 341.5k tokens is manually set and I'll be slowly pushing it back now I'm confident there's overhead for memory eviction of key values into cache. The question now is will the proposed fix complete in those remaining 16k tokens, as I'll be cross if the trial run fails also to produce a worthwhile outcome. Kudos to Apple, DeepSeek and oMLX. submitted by /u/challis88ocarina [link] [comments]